You will have to repeatedly open the file just long enough to read from it, then close it as soon as possible - you don't want your reads interfering with possible write. You will have to monitor the folder to look for roll-over files. When you read from a file you won't know if you reached the end of the file because you caught up to the writes, or if you reach the end of the file because the log file rolled over to another one. So when you read the EOF, you should check for the existence of the next file - if it doesn't exist then try re-reading the current one.
When you read from the file give up time to allow the writing task to gain access to the file. Use only one Thread to actually access the file - unless you have a multi-headed disk. Reading from the same disk in multiple threads usually just slows things down. Once you read from the file you can, of course, distribute the work required to multiple threads...
You can handle it a few ways:
1) When you come to the end of a file - check for the existence of the the next file in order, assuming the naming can be predicted
2) When you read from a file record the lastModified time stamp on the file. When you come to the end of a file look through the directory with a lastModified time greater than the one you last read
3) In a separate recurring task, look through the folder and create a collection to put the files into. Each time a new file appears, put it into the Collection. When you come to the end of a file look at the next File in the collection.
I am sure there are other ways. Implementation details will probably help you decide which to use. My preferences would be #1 if you can predict the name of the next file. If you can't then I would try to implement #2. If that is impractical for some reason (for example lots of files making the search for the next file take too long, lastModified returning nonsense, or some other detail) I would look to #3. #3 might be the best route to go if you are processing the files after-the fact - in which case all the Files can be identified and ordered first, then read. But if the files are being written as they are being processed it gets a little hairy and I would try to avoid it.
If the application that produces the logs uses log4j, and you have write access to its configuration, you could add a SocketAppender that sends the log messages to your application. But you scenario is probably more complicated than this...
Question...is it your application making the log output or some other application/process?
Also, I don't know how you can be expected to take into account any/all possible log file content formats. Certainly you must be working within a known set of possible formats...I can't imagine how you'd know how to parse them otherwise.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
subject: Read log files incrementally (as they are written)