• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Logging Actions with Minimum Impact

 
Scott Selikoff
author
Saloon Keeper
Posts: 4008
18
Eclipse IDE Flex Google Web Toolkit
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a system where certain clients actions are being logged to a central database for statistical reporting. The problem is a small number of users can generate tens of thousands of small records a day. Within a month we're into a million records added to a single table.

My question is, what's the best design pattern to handle this kind of usage? There's a few solutions I can think of, but some guidance for those that have experienced this kind of pattern would be nice:
  • Leave it as it is. Eventually backing-up/restoring the database becomes a time consuming operation even though the disc space of the database isn't growing as fast. As mentioned, the system definitely slows down as the table grows
  • Wipe the table to disk daily. Good solution although then statistical reporting becomes a process of merging text files. The advantage is the table stays small in the long run
  • Normalization/Schema changes: I've considered concatenating database records via normalization although this will great greater database connection and the number of records would still be the same
  • Disconnect the statistic database from the operations database. Good solution for clarity of data, but if the two databases are on the same machine and the operations are synchronous, it doesn't really solve the problem

  • [ April 16, 2008: Message edited by: Scott Selikoff ]
     
    Stevi Deter
    Ranch Hand
    Posts: 265
    Hibernate Java Spring
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Scott,

    The solution we usually implement isn't quite on your list:

    At some interval (daily/weekly) copy the current records to an archive table, then truncate the actively used table.

    If you don't need this audit information on the production database, you can have the archive table in your reporting database to help keep your production database small.
     
    steve souza
    Ranch Hand
    Posts: 862
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    what types of things are you auditing? queries by user? logins? things other than jdbc? Is this for performance or auditing reasons? Some databases have auditing capabilities.

    In the past I have aggregated older statistics and so was able to delete the details of all but the latest details. If you don't want to aggregate you could simply remove any data older than a month or archive it.
     
    Scott Selikoff
    author
    Saloon Keeper
    Posts: 4008
    18
    Eclipse IDE Flex Google Web Toolkit
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Yeah, the more I think about it, performing daily/weekly aggregate data builds then wiping the table is probably the best approach. The problem right now is we're working with the requirement "record everything, we'll sort it out later" mentality which needs to be fleshed out better since recording everything is too expensive.
     
    Jeanne Boyarsky
    author & internet detective
    Marshal
    Posts: 34078
    337
    Eclipse IDE Java VI Editor
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I'd also go with the archive/historical table. In some cases, you can't aggregate things such as if you need to know who made each change and what it is down the road.
     
    paul wheaton
    Trailboss
    Pie
    Posts: 21341
    Firefox Browser IntelliJ IDE Java
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Let's embrace that one person can generate 100,000 events in one day.

    Let's embrace that we need to preserve the data for at least three months (and then it will be archived, or whatever).

    First you have a row that talks about this person on this day and there is an ID for that row. Then, in another table, you have two columns: one column for the matching ID and one column for an integer that represents the time of day and type of event. There are 86400 seconds in a day. This leaves room for hundreds of types of events in a four byte int.

    If you really need to put the squeeze on, you could log this to a file. The file name could represent the unique ID and then there could be a series of 4 byte ints. One day for one user doing 100,000 things would be 400k. Maybe 40k compressed.
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic