This week's book giveaway is in the OCMJEA forum.
We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line!
See this thread for details.
The moose likes Performance and the fly likes Web analytics, best performance? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Web analytics, best performance?" Watch "Web analytics, best performance?" New topic
Author

Web analytics, best performance?

Joshua Silva
Greenhorn

Joined: Aug 17, 2013
Posts: 5
For an internal web analytics platform here the traffic is around 15 million hits per month. That only equals out to around 7 request per second, say 25 during peak times. We are curious though the best way to make a web analytics platform very fast and scalable.

So basically similar to google analytics, the platform has a snippet of JS, that then goes and fires and SQL query. Now the question is, should we update this query on the fly, or should we just do an insert and let another process, *process* the data and update it for the end user (so they can see up to date analytics).

Should a relational db be used for this insert? Or would something else be faster? Then parse that *log file* or whatnot into the DB? Maybe that would be quicker than hitting the database every request, and doing a batch import into the database every 30 seconds or every minute. This follows along the theory that opening a connection and doing 1k queries is faster than opening and doing 1 and closing etc etc for every request.

Maybe there is a completely different approach for this, that we are just not aware of. Any input would be great.

Thank you
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41634
    
  55
The first thing that comes to mind is to decouple gathering the stats and saving them. Push the incoming stats into some king of queue, and have a lower-priority job process that queue by saving it in whichever way you want to save it. That way high traffic, or a slowdown in the DB (or file system) doesn't affect the speed of stats gathering.


Ping & DNS - my free Android networking tools app
Joshua Silva
Greenhorn

Joined: Aug 17, 2013
Posts: 5
Any other thoughts on this? Ways to pull it off, so it could scale up?

The max we will do is probably around 30 million hits per month, but still. I would like to make it as good as I possibly could. Anyone with analytics experience would be very appreciated.

fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11257
    
  16

Joshua Silva wrote:I would like to make it as good as I possibly could.

Just an observation - that isn't really a very good spec. One can always improve things, if one is willing to spend more time/money/resources. The law of diminishing returns certainly applies here.

So, come up with a specific, quantifiable spec, with actual numbers and statistics, not vague 'make it better' rhetoric. That's the only way you'll know if you've hit your target. You can certainly go back and revise the specs if you need to, but you need to have an obtainable goal.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Web analytics, best performance?