aspose file tools*
The moose likes Java in General and the fly likes Processing a large comma-separated file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Processing a large comma-separated file" Watch "Processing a large comma-separated file" New topic
Author

Processing a large comma-separated file

Leena Diwan
Ranch Hand

Joined: Jun 18, 2001
Posts: 351
Hello Friends,

I need to read a very large comma-separated file (.txt or .csv - depending on requirement) and want
to be able to skip a few records if they are already been processed. Like if for some reasons the file
is processed halfway, I need to be able to set the counter to the location from where to begin processing
the next time.

If I read the file using SQL, in one example I saw
strSQL = "select * from " + filepath was executed and the resultset was traversed. This way I will be able
to move the record counter easily using the resultset methods.

I wanted to know if anyone has used this appraoch and if this is an efficient way of reading
a large comma-separated file.

Anyone has a better idea or can make me aware of any flaws of the above approach, it will be nice.

Regards,
Leena


[SCJP2, SCWCD1.3, SCBCD]
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42601
    
  65
I have not used a CSV/JDBC driver, but I would think that there is a significant overhead. Working directly on the file level might be more performant. There are ready-made helper classes like the Ostermiller CSV class which help with the reading of the file.


Ping & DNS - my free Android networking tools app
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
Sounds like you want random access to the file (e.g., skip first half) and my inclination would be to use a RandomAccessFile. Memory mapped file facilities in Java are excellent.
Leena Diwan
Ranch Hand

Joined: Jun 18, 2001
Posts: 351
Hello All,

I have done some trial and error on this. I tested the OsterMiller utility, while loop with StringTokenizer
and while loop with my own token separator.

I checked the time and free memory.

Time taken by taking system time before and after processing. Then taking the difference.
And free memory by taking - Runtime.getRuntime().freeMemory() before and after processing.

My comma separated files are gonna be huge so I dont want a logic that takes up a lot of memory.
Is this free memory a proprer major of checking if lots of memory id being used or not? Is it foolproof
in all the scenarios?

I feel the utility is actually requiring a lot of memory. Lot of memory gets allocated and that results
in a bigger difference when free memory diff is calculated.

Anyone has good experience with the utility from performance point of view?

Regards,
Leena
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Processing a large comma-separated file