File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Performance and the fly likes please give me some hint of how to make an efficient algorithm to parse the log file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "please give me some hint of how to make an efficient algorithm to parse the log file" Watch "please give me some hint of how to make an efficient algorithm to parse the log file" New topic
Author

please give me some hint of how to make an efficient algorithm to parse the log file

Men Gumani
Ranch Hand

Joined: Apr 01, 2009
Posts: 31
Hi all,


I am trying to parse a log file which contains 4 columns, <date> <time> <status> <name>, <status> column only have 2 candidates, "run" or "sleep", what is the efficient way to print a list of names for all <status> equals to "run" on a given date? Any hints will be appreciated, thank you so much guys!
Men Gumani
Ranch Hand

Joined: Apr 01, 2009
Posts: 31
Please help ~
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24184
    
  34

Well, there are only 2 ways that make sense: read each line, and print only the ones you want, or read all lines, store in a database, and then query the database. The first is fine if you process a log only once; the second is better if you need to make many queries against a log. Does this answer your question?

[Jess in Action][AskingGoodQuestions]
Men Gumani
Ranch Hand

Joined: Apr 01, 2009
Posts: 31
I guess I only need to print out a list of urls for all the blocks on a given date, so I only have to traverse the whole log file once(not too sure), is there anything interesting about the order of the token I should process? e.g. if I process <status> first then check the <date> will it be faster? I am not sure if I can use a good algorithm and cause better performance, thank you.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24184
    
  34

It doesn't matter at all. You have to visit every record, so the amount of work is proportional to the size of the file, no matter what. It's true that you could possibly minimize the number of field checks, but that's a very small percentage of the task time. The I/O will take orders of magnitude longer, so it's not worth worrying about.
Kees Jan Koster
JavaMonitor Support
Rancher

Joined: Mar 31, 2009
Posts: 251
    
    5
Dear Men,

Instead of doing this in Java, consider using the command line tool grep or awk for preprocessing. This allows you to reduce the file that enters you Java program. These tools were designed specifically for this task and have been around for decades, so the chances that you or I would be able to write a faster version is slim.

They are standard on all UNIX machines. Just type "man grep" to learn about them.


Java-monitor, JVM monitoring made easy <- right here on Java Ranch
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4655
    
    5

why do you care about efficiency in this usage? You don't process log files often enough to make it an issue. Find any algorithm that works, and be done with worry.
Randi Randwa
Greenhorn

Joined: Feb 21, 2009
Posts: 7

If you are going the scripting or command-line route, there is a good sample script posted at http://www.biterscripting.com/SS_WebLogParser.html .

Amit Ghorpade
Bartender

Joined: Jun 06, 2007
Posts: 2716
    
    6

"Randi Randwa " please check your private messages for an important administrative matter. You can check them by clicking the My Private Messages link above.


SCJP, SCWCD.
|Asking Good Questions|
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: please give me some hint of how to make an efficient algorithm to parse the log file