File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes sort different file length records Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "sort different file length records" Watch "sort different file length records" New topic
Author

sort different file length records

Helen Thomas
Ranch Hand

Joined: Jan 13, 2004
Posts: 1759
A file with different length records.
How do you sort the records using java.
NIO File Channel ?
An Abstract Factory sort implementation ?
sort utility ?


Le Cafe Mouse - Helen's musings on the web - Java Skills and Thrills
"God who creates and is nature is very difficult to understand, but he is not arbitrary or malicious." OR "God does not play dice." - Einstein
Helen Thomas
Ranch Hand

Joined: Jan 13, 2004
Posts: 1759
Sun doesn't promote the NIO option a lot- perhaps because it is non-trivial.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Are you concerned about sorting the records or reading the records from the file?


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Originally posted by Helen Thomas:
A file with different length records.
How do you sort the records using java.
NIO File Channel ?
An Abstract Factory sort implementation ?
sort utility ?

Hi Helen,
Is this a class assignment? Also, are you trying to sort the records by length, or some other criteria?


Java Regular Expressions
Helen Thomas
Ranch Hand

Joined: Jan 13, 2004
Posts: 1759
It's just a question I was asked comparing the power of using java vs. say , reading the files into a temporary SQL table and sorting them there (on a powerful machine , of course).And no, it's not an assignment, Max.
The question would cover both reading and sorting the records using java, Lasse.
Length could be an option, or the value in certain key positions on the file.
Legacy systems usually have record type stored somewhere at the start of the record. 01,02,03
thanks
[ April 14, 2004: Message edited by: Helen Thomas ]
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Try the following: no promises
Maulin Vasavada
Ranch Hand

Joined: Nov 04, 2001
Posts: 1871
Hi Helen
At the end I would go with SQL thing I guess because that is going to be scalable option in terms of performance etc if we can't predict the size of the file and nature of the records in the file etc..
My 2 cents.
Maulin
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Well, the sorting part can be implemented by relying on Collections#sort() and the Comparable interface.
The whole program would look something like

The most interesting part here is the class "MyRecordComparator", which might look like this:
Helen Thomas
Ranch Hand

Joined: Jan 13, 2004
Posts: 1759
Thanks all.
Max, that looks really elegant and could be proven on a fast machine.
Maulin, scalability is very good advice.
It may be a good idea to convert the legacy files into XML documents at the same time, so the following seems another attractive option if it proves a fast enough solution.
Sorting in XSLT
At least now we know how to do it in Java.
Wow! A second Java solution.
Thanks all again.
[ April 14, 2004: Message edited by: Helen Thomas ]
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Happy to help Helen
As for XML: be aware that it's very possible that your XML API is using regex under the covers, so if speed is your issue, you might find that you're taking the long way around the block.
All best,
M
Helen Thomas
Ranch Hand

Joined: Jan 13, 2004
Posts: 1759
Thanks Max.
Helen Thomas
Ranch Hand

Joined: Jan 13, 2004
Posts: 1759
Now the file is extremeley large : order of magnitude > 100GB and they insist it should be done in Java.
Would it be faster to split the file into several sections and read and sort in several passes using temporary workfiles in between.
For e.g On the first pass read store the keys into a Collection ordered by key value.
Derive the number of manageable workfiles required
Read the file in a second pass and from the Collection figure out which workfile to write it to.
Does anyone have any links to benchmark data of what would be manageable sizes for Collection.sort to perform efficiently in Websphere and Oracle 9i.
Sort the workfiles using any of the two methods given above and concatenate the files.
I am sure that this springs mainly from a desire to be in control and be able to watch a process , yes it has gone through step 1,2 and now is in step 3 rather than having to wait for what seems to be an endless time waiting for a single huge process to complete.
Actually you can have the workfiles by market. No best to stick to a sort algorithm.
[ April 22, 2004: Message edited by: Helen Thomas ]
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Originally posted by Helen Thomas:
Now the file is extremely large : order of magnitude > 100GB and they insist it should be done in Java.[ April 22, 2004: Message edited by: Helen Thomas ]


Wow! I don't think I've ever working with a file that big.
In my option, you're spot on the money. Break the file down into smaller files which contain only the keys. Sort those, then recombine them in an orderly fashion. Then create an a list of which keys can be found where.
You are, in effect, creating an db index here. Next, create an index to your key index. Thus, keys starting with the letters A-D might be in file1. keys starting with the letters E-H might be in file2. You need a file that will track such information.
Next, extracts the records keyed on file1, file2, etc., from the original file. Or just keep the index if you only need the ability to achieve sorted order.
As a follow through, make sure that future records are inserted in an orderly way, and integrated into your key index. Thus, you'll only have the 'big work' once.
Nevertheless, 100GB is just going to take time.
Let me know if this makes sense.
M
Helen Thomas
Ranch Hand

Joined: Jan 13, 2004
Posts: 1759
Originally posted by Max Habibi:

question, are you using JDK 1.4 here?
M

At the moment JDK 1.3.
JDK 1.4 would be better for the task though.
Helen Thomas
Ranch Hand

Joined: Jan 13, 2004
Posts: 1759
How is a radix sort different than a merge sort ?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: sort different file length records
 
Similar Threads
how to Initialize an ArrayList in a constructor with no parameters?
Find last line number in a file
NX: Concurrent writes to different records
FileChannel, MappedByteByffer, NIO questions
How to detect the end of file