hello all, just a quick one, this is my first post to the forum and i like what im seeing ;-) . Any way i have a bit of a problem. i have a 180Mb CSV file and i need to sort it in name and post code (ZIP) order. i have a small app to sort it and it works fine on small files, but when i try to sort the beast of a CSV it gives me
i also have a smaller CSV around 20Mb and that gives the same error. does any one have some suggestions if so they would be greatly appreciated. CODE BELOW thanks in advance.. Tom Roffe
Hi Tom. Here are a few things to try: 1. Run the virtual machine with this parameter "-Xmx1500m" It will set the memory size for VM, the default one is too small for your task. 2. Optimize the memory usage of your task. For instance, instead of inserting into the list raw strings read from the input file, insert arrays of strings (the results of split), and update your comparator and result file output code accordingly. 3. You might want to sort your data by inserting helper objects representing your input strings into sorted collection (like TreeSet). You will need to take care of duplicate keys though.
Joined: Feb 10, 2004
BIG Thanks Dmitry Melnik, worked treat. one other problem has just presented it's ugly self. The input file has some entries that are in CAPS and others arn't. When the program sorts the list the CAP'ized entries are at the top of the list sorted and the non-caps entrys are sorted but at the EOF. Question, how can i make the sort process ingore the case of the file entries.
Joined: Dec 18, 2003
Before you start sorting convert to the same case (with toLower(), toUpper()) the strings you compare.