File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Performance and the fly likes Web analytics, best performance? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Web analytics, best performance?" Watch "Web analytics, best performance?" New topic
Author

Web analytics, best performance?

Joshua Silva
Greenhorn

Joined: Aug 17, 2013
Posts: 5
For an internal web analytics platform here the traffic is around 15 million hits per month. That only equals out to around 7 request per second, say 25 during peak times. We are curious though the best way to make a web analytics platform very fast and scalable.

So basically similar to google analytics, the platform has a snippet of JS, that then goes and fires and SQL query. Now the question is, should we update this query on the fly, or should we just do an insert and let another process, *process* the data and update it for the end user (so they can see up to date analytics).

Should a relational db be used for this insert? Or would something else be faster? Then parse that *log file* or whatnot into the DB? Maybe that would be quicker than hitting the database every request, and doing a batch import into the database every 30 seconds or every minute. This follows along the theory that opening a connection and doing 1k queries is faster than opening and doing 1 and closing etc etc for every request.

Maybe there is a completely different approach for this, that we are just not aware of. Any input would be great.

Thank you
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42906
    
  69
The first thing that comes to mind is to decouple gathering the stats and saving them. Push the incoming stats into some king of queue, and have a lower-priority job process that queue by saving it in whichever way you want to save it. That way high traffic, or a slowdown in the DB (or file system) doesn't affect the speed of stats gathering.
Joshua Silva
Greenhorn

Joined: Aug 17, 2013
Posts: 5
Any other thoughts on this? Ways to pull it off, so it could scale up?

The max we will do is probably around 30 million hits per month, but still. I would like to make it as good as I possibly could. Anyone with analytics experience would be very appreciated.

fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11494
    
  16

Joshua Silva wrote:I would like to make it as good as I possibly could.

Just an observation - that isn't really a very good spec. One can always improve things, if one is willing to spend more time/money/resources. The law of diminishing returns certainly applies here.

So, come up with a specific, quantifiable spec, with actual numbers and statistics, not vague 'make it better' rhetoric. That's the only way you'll know if you've hit your target. You can certainly go back and revise the specs if you need to, but you need to have an obtainable goal.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: Web analytics, best performance?