File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Processing a large comma-separated file

 
Leena Diwan
Ranch Hand
Posts: 351
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Friends,

I need to read a very large comma-separated file (.txt or .csv - depending on requirement) and want
to be able to skip a few records if they are already been processed. Like if for some reasons the file
is processed halfway, I need to be able to set the counter to the location from where to begin processing
the next time.

If I read the file using SQL, in one example I saw
strSQL = "select * from " + filepath was executed and the resultset was traversed. This way I will be able
to move the record counter easily using the resultset methods.

I wanted to know if anyone has used this appraoch and if this is an efficient way of reading
a large comma-separated file.

Anyone has a better idea or can make me aware of any flaws of the above approach, it will be nice.

Regards,
Leena
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have not used a CSV/JDBC driver, but I would think that there is a significant overhead. Working directly on the file level might be more performant. There are ready-made helper classes like the Ostermiller CSV class which help with the reading of the file.
 
Rick O'Shay
Ranch Hand
Posts: 531
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sounds like you want random access to the file (e.g., skip first half) and my inclination would be to use a RandomAccessFile. Memory mapped file facilities in Java are excellent.
 
Leena Diwan
Ranch Hand
Posts: 351
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello All,

I have done some trial and error on this. I tested the OsterMiller utility, while loop with StringTokenizer
and while loop with my own token separator.

I checked the time and free memory.

Time taken by taking system time before and after processing. Then taking the difference.
And free memory by taking - Runtime.getRuntime().freeMemory() before and after processing.

My comma separated files are gonna be huge so I dont want a logic that takes up a lot of memory.
Is this free memory a proprer major of checking if lots of memory id being used or not? Is it foolproof
in all the scenarios?

I feel the utility is actually requiring a lot of memory. Lot of memory gets allocated and that results
in a bigger difference when free memory diff is calculated.

Anyone has good experience with the utility from performance point of view?

Regards,
Leena
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic