Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

please give me some hint of how to make an efficient algorithm to parse the log file

 
Men Gumani
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,


I am trying to parse a log file which contains 4 columns, <date> <time> <status> <name>, <status> column only have 2 candidates, "run" or "sleep", what is the efficient way to print a list of names for all <status> equals to "run" on a given date? Any hints will be appreciated, thank you so much guys!
 
Men Gumani
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please help ~
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24208
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, there are only 2 ways that make sense: read each line, and print only the ones you want, or read all lines, store in a database, and then query the database. The first is fine if you process a log only once; the second is better if you need to make many queries against a log. Does this answer your question?
 
Men Gumani
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I guess I only need to print out a list of urls for all the blocks on a given date, so I only have to traverse the whole log file once(not too sure), is there anything interesting about the order of the token I should process? e.g. if I process <status> first then check the <date> will it be faster? I am not sure if I can use a good algorithm and cause better performance, thank you.
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24208
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It doesn't matter at all. You have to visit every record, so the amount of work is proportional to the size of the file, no matter what. It's true that you could possibly minimize the number of field checks, but that's a very small percentage of the task time. The I/O will take orders of magnitude longer, so it's not worth worrying about.
 
Kees Jan Koster
JavaMonitor Support
Rancher
Posts: 251
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear Men,

Instead of doing this in Java, consider using the command line tool grep or awk for preprocessing. This allows you to reduce the file that enters you Java program. These tools were designed specifically for this task and have been around for decades, so the chances that you or I would be able to write a faster version is slim.

They are standard on all UNIX machines. Just type "man grep" to learn about them.
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
why do you care about efficiency in this usage? You don't process log files often enough to make it an issue. Find any algorithm that works, and be done with worry.
 
Randi Randwa
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

If you are going the scripting or command-line route, there is a good sample script posted at http://www.biterscripting.com/SS_WebLogParser.html .

 
Amit Ghorpade
Bartender
Posts: 2854
10
Fedora Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"Randi Randwa " please check your private messages for an important administrative matter. You can check them by clicking the My Private Messages link above.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic