permaculture playing cards
The moose likes JDBC and Relational Databases and the fly likes Logging Actions with Minimum Impact Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Java Interview Guide this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "Logging Actions with Minimum Impact" Watch "Logging Actions with Minimum Impact" New topic

Logging Actions with Minimum Impact

Scott Selikoff
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3753

I have a system where certain clients actions are being logged to a central database for statistical reporting. The problem is a small number of users can generate tens of thousands of small records a day. Within a month we're into a million records added to a single table.

My question is, what's the best design pattern to handle this kind of usage? There's a few solutions I can think of, but some guidance for those that have experienced this kind of pattern would be nice:
  • Leave it as it is. Eventually backing-up/restoring the database becomes a time consuming operation even though the disc space of the database isn't growing as fast. As mentioned, the system definitely slows down as the table grows
  • Wipe the table to disk daily. Good solution although then statistical reporting becomes a process of merging text files. The advantage is the table stays small in the long run
  • Normalization/Schema changes: I've considered concatenating database records via normalization although this will great greater database connection and the number of records would still be the same
  • Disconnect the statistic database from the operations database. Good solution for clarity of data, but if the two databases are on the same machine and the operations are synchronous, it doesn't really solve the problem

  • [ April 16, 2008: Message edited by: Scott Selikoff ]

    [OCA 8 Book] [Blog]
    Stevi Deter
    Ranch Hand

    Joined: Mar 22, 2008
    Posts: 265


    The solution we usually implement isn't quite on your list:

    At some interval (daily/weekly) copy the current records to an archive table, then truncate the actively used table.

    If you don't need this audit information on the production database, you can have the archive table in your reporting database to help keep your production database small.

    There will always be people who are ahead of the curve, and people who are behind the curve. But knowledge moves the curve. --Bill James
    steve souza
    Ranch Hand

    Joined: Jun 26, 2002
    Posts: 862
    what types of things are you auditing? queries by user? logins? things other than jdbc? Is this for performance or auditing reasons? Some databases have auditing capabilities.

    In the past I have aggregated older statistics and so was able to delete the details of all but the latest details. If you don't want to aggregate you could simply remove any data older than a month or archive it. - a fast, free open source performance tuning api.
    JavaRanch Performance FAQ
    Scott Selikoff
    Saloon Keeper

    Joined: Oct 23, 2005
    Posts: 3753

    Yeah, the more I think about it, performing daily/weekly aggregate data builds then wiping the table is probably the best approach. The problem right now is we're working with the requirement "record everything, we'll sort it out later" mentality which needs to be fleshed out better since recording everything is too expensive.
    Jeanne Boyarsky
    author & internet detective

    Joined: May 26, 2003
    Posts: 33132

    I'd also go with the archive/historical table. In some cases, you can't aggregate things such as if you need to know who made each change and what it is down the road.

    [OCA 8 book] [Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
    Other Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, TOGAF part 1 and part 2
    paul wheaton

    Joined: Dec 14, 1998
    Posts: 20973

    Let's embrace that one person can generate 100,000 events in one day.

    Let's embrace that we need to preserve the data for at least three months (and then it will be archived, or whatever).

    First you have a row that talks about this person on this day and there is an ID for that row. Then, in another table, you have two columns: one column for the matching ID and one column for an integer that represents the time of day and type of event. There are 86400 seconds in a day. This leaves room for hundreds of types of events in a four byte int.

    If you really need to put the squeeze on, you could log this to a file. The file name could represent the unique ID and then there could be a series of 4 byte ints. One day for one user doing 100,000 things would be 400k. Maybe 40k compressed.

    permaculture Wood Burning Stoves 2.0 - 4-DVD set
    I agree. Here's the link:
    subject: Logging Actions with Minimum Impact
    jQuery in Action, 3rd edition