This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Java in General and the fly likes sorting lines of strings in a file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "sorting lines of strings in a file" Watch "sorting lines of strings in a file" New topic
Author

sorting lines of strings in a file

jin sun
Ranch Hand

Joined: Feb 16, 2005
Posts: 30
Hi, here's my problem:

I have this file which can be anywhere from 200 MB+, and within that file contains lines that look look like this:

ENG|glucopyranoside|C0644705|L1129264|S1355824|

So, what I want to do is sort by the 2nd field (glucopyranoside)alphanumerically. I've been trying to add each line to a list and sort using the collections.sort(), but I get out of memory errors. I think my problem is trying to put all the lines on a list and sort, haha. Is there a way around this?
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
If you are sure that the character set is ASCII, and you have enough memory that can be assigned to the JVM, do this:
1. read the whole file as one big byte[] (half the size of char[] used for Strings
2. locate all the line starts by scanning through the byte[], keeping a List of the line starts - possibly as Integer objects or maybe as a custom object.
3. create a class implementing Comparator that can find the field to sort on and return the correct compare and equals results
4. sort the List by providing the Comparator to Collections.sort()
- Thats not the fasted sort but it will be simple to code.

It you cant get it all in memory you will have to read and sort chunks - later merging the sorted chunks.

Bill
jin sun
Ranch Hand

Joined: Feb 16, 2005
Posts: 30
Sorry, I think I need to rephrase my problem more. The file is already sorted and I want to insert new line(s) (following the same format as my example above) into it's right place. For example:

I want to insert the following into the file:
ENG|horse|C0644705|L1129264|S1355824|

And in the file it would go between the following based on the second field:
ENG|giant|C0644727|L1129215|S1355816|
ENG|house|C0644732|L1129211|S1355819|

I'm stumped, this is probably simple but I'm really rusty, any suggestions would be a help.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
Your restatement of the problem does not resemble the original at all - anyway -

The only way to do this is by creating a new file, reading lines from the old and writing to the new until you hit the right spot to insert. If you know the old file is sorted, the right spot is when you hit a line that comes after the line to be inserted.
Bill
jin sun
Ranch Hand

Joined: Feb 16, 2005
Posts: 30
^yea, my restatement is completely different from my original one, sorry.

The only way to do this is by creating a new file, reading lines from the old and writing to the new until you hit the right spot to insert. If you know the old file is sorted, the right spot is when you hit a line that comes after the line to be inserted.


Thanks, I thought there was another way of doing it, but I guess I have to do it that way.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: sorting lines of strings in a file
 
Similar Threads
difference between Java, VisualBasic & C
word file not getting downloaded on Netscape6
sorting a file with string/int fields
word file not getting downloaded on Netscape6
Problem zipping the file