I am trying to parse a log file which contains 4 columns, <date> <time> <status> <name>, <status> column only have 2 candidates, "run" or "sleep", what is the efficient way to print a list of names for all <status> equals to "run" on a given date? Any hints will be appreciated, thank you so much guys!
Well, there are only 2 ways that make sense: read each line, and print only the ones you want, or read all lines, store in a database, and then query the database. The first is fine if you process a log only once; the second is better if you need to make many queries against a log. Does this answer your question?
I guess I only need to print out a list of urls for all the blocks on a given date, so I only have to traverse the whole log file once(not too sure), is there anything interesting about the order of the token I should process? e.g. if I process <status> first then check the <date> will it be faster? I am not sure if I can use a good algorithm and cause better performance, thank you.
author and iconoclast
It doesn't matter at all. You have to visit every record, so the amount of work is proportional to the size of the file, no matter what. It's true that you could possibly minimize the number of field checks, but that's a very small percentage of the task time. The I/O will take orders of magnitude longer, so it's not worth worrying about.
Instead of doing this in Java, consider using the command line tool grep or awk for preprocessing. This allows you to reduce the file that enters you Java program. These tools were designed specifically for this task and have been around for decades, so the chances that you or I would be able to write a faster version is slim.
They are standard on all UNIX machines. Just type "man grep" to learn about them.