This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.
I need to read a very large comma-separated file (.txt or .csv - depending on requirement) and want to be able to skip a few records if they are already been processed. Like if for some reasons the file is processed halfway, I need to be able to set the counter to the location from where to begin processing the next time.
If I read the file using SQL, in one example I saw strSQL = "select * from " + filepath was executed and the resultset was traversed. This way I will be able to move the record counter easily using the resultset methods.
I wanted to know if anyone has used this appraoch and if this is an efficient way of reading a large comma-separated file.
Anyone has a better idea or can make me aware of any flaws of the above approach, it will be nice.
I have not used a CSV/JDBC driver, but I would think that there is a significant overhead. Working directly on the file level might be more performant. There are ready-made helper classes like the Ostermiller CSV class which help with the reading of the file.
Sounds like you want random access to the file (e.g., skip first half) and my inclination would be to use a RandomAccessFile. Memory mapped file facilities in Java are excellent.
Joined: Jun 18, 2001
I have done some trial and error on this. I tested the OsterMiller utility, while loop with StringTokenizer and while loop with my own token separator.
I checked the time and free memory.
Time taken by taking system time before and after processing. Then taking the difference. And free memory by taking - Runtime.getRuntime().freeMemory() before and after processing.
My comma separated files are gonna be huge so I dont want a logic that takes up a lot of memory. Is this free memory a proprer major of checking if lots of memory id being used or not? Is it foolproof in all the scenarios?
I feel the utility is actually requiring a lot of memory. Lot of memory gets allocated and that results in a bigger difference when free memory diff is calculated.
Anyone has good experience with the utility from performance point of view?