• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Logging Actions with Minimum Impact

 
author
Posts: 4335
39
jQuery Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a system where certain clients actions are being logged to a central database for statistical reporting. The problem is a small number of users can generate tens of thousands of small records a day. Within a month we're into a million records added to a single table.

My question is, what's the best design pattern to handle this kind of usage? There's a few solutions I can think of, but some guidance for those that have experienced this kind of pattern would be nice:
  • Leave it as it is. Eventually backing-up/restoring the database becomes a time consuming operation even though the disc space of the database isn't growing as fast. As mentioned, the system definitely slows down as the table grows
  • Wipe the table to disk daily. Good solution although then statistical reporting becomes a process of merging text files. The advantage is the table stays small in the long run
  • Normalization/Schema changes: I've considered concatenating database records via normalization although this will great greater database connection and the number of records would still be the same
  • Disconnect the statistic database from the operations database. Good solution for clarity of data, but if the two databases are on the same machine and the operations are synchronous, it doesn't really solve the problem

  • [ April 16, 2008: Message edited by: Scott Selikoff ]
     
    Ranch Hand
    Posts: 265
    Hibernate Spring Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Scott,

    The solution we usually implement isn't quite on your list:

    At some interval (daily/weekly) copy the current records to an archive table, then truncate the actively used table.

    If you don't need this audit information on the production database, you can have the archive table in your reporting database to help keep your production database small.
     
    Ranch Hand
    Posts: 862
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    what types of things are you auditing? queries by user? logins? things other than jdbc? Is this for performance or auditing reasons? Some databases have auditing capabilities.

    In the past I have aggregated older statistics and so was able to delete the details of all but the latest details. If you don't want to aggregate you could simply remove any data older than a month or archive it.
     
    Scott Selikoff
    author
    Posts: 4335
    39
    jQuery Eclipse IDE Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Yeah, the more I think about it, performing daily/weekly aggregate data builds then wiping the table is probably the best approach. The problem right now is we're working with the requirement "record everything, we'll sort it out later" mentality which needs to be fleshed out better since recording everything is too expensive.
     
    author & internet detective
    Posts: 41860
    908
    Eclipse IDE VI Editor Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I'd also go with the archive/historical table. In some cases, you can't aggregate things such as if you need to know who made each change and what it is down the road.
     
    Trailboss
    Posts: 23778
    IntelliJ IDE Firefox Browser Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Let's embrace that one person can generate 100,000 events in one day.

    Let's embrace that we need to preserve the data for at least three months (and then it will be archived, or whatever).

    First you have a row that talks about this person on this day and there is an ID for that row. Then, in another table, you have two columns: one column for the matching ID and one column for an integer that represents the time of day and type of event. There are 86400 seconds in a day. This leaves room for hundreds of types of events in a four byte int.

    If you really need to put the squeeze on, you could log this to a file. The file name could represent the unique ID and then there could be a series of 4 byte ints. One day for one user doing 100,000 things would be 400k. Maybe 40k compressed.
     
    Self destruct mode activated. Instructions for deactivation encoded in this tiny ad.
    a bit of art, as a gift, the permaculture playing cards
    https://gardener-gift.com
    reply
      Bookmark Topic Watch Topic
    • New Topic