This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I have a huge file and I wanted to extract the lines availble in the file between a particular time frame.
Note: The file has a specific format in each line of the log specifies Date/Time.Also Not all the lines in the logfile has date/Time stamp present in it . When there are errors during a particular time, the exceptions are also thrown in the file for that time. I want to extract all the data for a particular day or days along with the exception details as well
the file in the below format,
03-29-11:05:25:04 [SAPEngine_Application_Thread[impl:3]_7] ERROR com.company.uii.core.WidgetRefImpl - ##### exception !!!
VEN-PRICING-1237: Expression did not evaluate to true or false [file D:\PMMPRD\builds\web\app\WEB-INF\meta\LineItemsWidget.jsp, line 26].
at java.security.AccessController.doPrivileged(Native Method)
04-01-11:00:08:32 [SAPEngine_Application_Thread[impl:3]_20] INFO com.vendavo.uii.controller.PageController - User [usb08025] disconnected.
In the above case, what do I do to extract the data from 03-29-11:05:25:04 to 04-01-11:00:08:32
Please suggest what should I use to extract this data. Should I use Pattern matching /Regular expressions to read/match? Please suggest. Thanks
If this was my problem I might create a Lucene index, since Lucene makes it relatively easy to query for date/time ranges. It also wouldn't matter how many log files there were, or where they are located - each new one would be added to the index, and would then be available for searches. It seems a better "fit" to the problem than storing the data in a DB (which would work just fine, as Campbell said). You could search the actual contents that way, too, which I would imagine might be a handy feature.
I would be calling a function while will return the object ( one log entry ) say LogEntry containing the following components: timestamp, thread, priority and the message part of the log.
If I read line by line, to get one log entry I will have to read from one timestamp to next.. during this process I would be reading the first line of the next log entry.. so the file reader will be pointing to the second line of the next log entry because of which I would be missing the first line of the second log entry.