• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Sorting a large CSV file

 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hello all,
just a quick one, this is my first post to the forum and i like what im seeing ;-) . Any way i have a bit of a problem. i have a 180Mb CSV file and i need to sort it in name and post code (ZIP) order. i have a small app to sort it and it works fine on small files, but when i try to sort the beast of a CSV it gives me

i also have a smaller CSV around 20Mb and that gives the same error. does any one have some suggestions if so they would be greatly appreciated.
CODE BELOW
thanks in advance..
Tom Roffe
 
Ranch Hand
Posts: 328
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Tom. Here are a few things to try:
1. Run the virtual machine with this parameter "-Xmx1500m" It will set the memory size for VM, the default one is too small for your task.
2. Optimize the memory usage of your task. For instance, instead of inserting into the list raw strings read from the input file, insert arrays of strings (the results of split), and update your comparator and result file output code accordingly.
3. You might want to sort your data by inserting helper objects representing your input strings into sorted collection (like TreeSet). You will need to take care of duplicate keys though.
 
Tom Roffe
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
BIG Thanks Dmitry Melnik, worked treat.
one other problem has just presented it's ugly self. The input file has some entries that are in CAPS and others arn't. When the program sorts the list the CAP'ized entries are at the top of the list sorted and the non-caps entrys are sorted but at the EOF.
Question, how can i make the sort process ingore the case of the file entries.
 
Dmitry Melnik
Ranch Hand
Posts: 328
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Before you start sorting convert to the same case (with toLower(), toUpper()) the strings you compare.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic