File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Processing a large comma-separated file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Processing a large comma-separated file" Watch "Processing a large comma-separated file" New topic

Processing a large comma-separated file

Leena Diwan
Ranch Hand

Joined: Jun 18, 2001
Posts: 351
Hello Friends,

I need to read a very large comma-separated file (.txt or .csv - depending on requirement) and want
to be able to skip a few records if they are already been processed. Like if for some reasons the file
is processed halfway, I need to be able to set the counter to the location from where to begin processing
the next time.

If I read the file using SQL, in one example I saw
strSQL = "select * from " + filepath was executed and the resultset was traversed. This way I will be able
to move the record counter easily using the resultset methods.

I wanted to know if anyone has used this appraoch and if this is an efficient way of reading
a large comma-separated file.

Anyone has a better idea or can make me aware of any flaws of the above approach, it will be nice.


Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
I have not used a CSV/JDBC driver, but I would think that there is a significant overhead. Working directly on the file level might be more performant. There are ready-made helper classes like the Ostermiller CSV class which help with the reading of the file.
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
Sounds like you want random access to the file (e.g., skip first half) and my inclination would be to use a RandomAccessFile. Memory mapped file facilities in Java are excellent.
Leena Diwan
Ranch Hand

Joined: Jun 18, 2001
Posts: 351
Hello All,

I have done some trial and error on this. I tested the OsterMiller utility, while loop with StringTokenizer
and while loop with my own token separator.

I checked the time and free memory.

Time taken by taking system time before and after processing. Then taking the difference.
And free memory by taking - Runtime.getRuntime().freeMemory() before and after processing.

My comma separated files are gonna be huge so I dont want a logic that takes up a lot of memory.
Is this free memory a proprer major of checking if lots of memory id being used or not? Is it foolproof
in all the scenarios?

I feel the utility is actually requiring a lot of memory. Lot of memory gets allocated and that results
in a bigger difference when free memory diff is calculated.

Anyone has good experience with the utility from performance point of view?

I agree. Here's the link:
subject: Processing a large comma-separated file
It's not a secret anymore!