Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

sort different file length records

 
Helen Thomas
Ranch Hand
Posts: 1759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A file with different length records.
How do you sort the records using java.
NIO File Channel ?
An Abstract Factory sort implementation ?
sort utility ?
 
Helen Thomas
Ranch Hand
Posts: 1759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sun doesn't promote the NIO option a lot- perhaps because it is non-trivial.
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you concerned about sorting the records or reading the records from the file?
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Helen Thomas:
A file with different length records.
How do you sort the records using java.
NIO File Channel ?
An Abstract Factory sort implementation ?
sort utility ?

Hi Helen,
Is this a class assignment? Also, are you trying to sort the records by length, or some other criteria?
 
Helen Thomas
Ranch Hand
Posts: 1759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's just a question I was asked comparing the power of using java vs. say , reading the files into a temporary SQL table and sorting them there (on a powerful machine , of course).And no, it's not an assignment, Max.
The question would cover both reading and sorting the records using java, Lasse.
Length could be an option, or the value in certain key positions on the file.
Legacy systems usually have record type stored somewhere at the start of the record. 01,02,03
thanks
[ April 14, 2004: Message edited by: Helen Thomas ]
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try the following: no promises
 
Maulin Vasavada
Ranch Hand
Posts: 1873
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Helen
At the end I would go with SQL thing I guess because that is going to be scalable option in terms of performance etc if we can't predict the size of the file and nature of the records in the file etc..
My 2 cents.
Maulin
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, the sorting part can be implemented by relying on Collections#sort() and the Comparable interface.
The whole program would look something like

The most interesting part here is the class "MyRecordComparator", which might look like this:
 
Helen Thomas
Ranch Hand
Posts: 1759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks all.
Max, that looks really elegant and could be proven on a fast machine.
Maulin, scalability is very good advice.
It may be a good idea to convert the legacy files into XML documents at the same time, so the following seems another attractive option if it proves a fast enough solution.
Sorting in XSLT
At least now we know how to do it in Java.
Wow! A second Java solution.
Thanks all again.
[ April 14, 2004: Message edited by: Helen Thomas ]
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Happy to help Helen
As for XML: be aware that it's very possible that your XML API is using regex under the covers, so if speed is your issue, you might find that you're taking the long way around the block.
All best,
M
 
Helen Thomas
Ranch Hand
Posts: 1759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Max.
 
Helen Thomas
Ranch Hand
Posts: 1759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Now the file is extremeley large : order of magnitude > 100GB and they insist it should be done in Java.
Would it be faster to split the file into several sections and read and sort in several passes using temporary workfiles in between.
For e.g On the first pass read store the keys into a Collection ordered by key value.
Derive the number of manageable workfiles required
Read the file in a second pass and from the Collection figure out which workfile to write it to.
Does anyone have any links to benchmark data of what would be manageable sizes for Collection.sort to perform efficiently in Websphere and Oracle 9i.
Sort the workfiles using any of the two methods given above and concatenate the files.
I am sure that this springs mainly from a desire to be in control and be able to watch a process , yes it has gone through step 1,2 and now is in step 3 rather than having to wait for what seems to be an endless time waiting for a single huge process to complete.
Actually you can have the workfiles by market. No best to stick to a sort algorithm.
[ April 22, 2004: Message edited by: Helen Thomas ]
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Helen Thomas:
Now the file is extremely large : order of magnitude > 100GB and they insist it should be done in Java.[ April 22, 2004: Message edited by: Helen Thomas ]


Wow! I don't think I've ever working with a file that big.
In my option, you're spot on the money. Break the file down into smaller files which contain only the keys. Sort those, then recombine them in an orderly fashion. Then create an a list of which keys can be found where.
You are, in effect, creating an db index here. Next, create an index to your key index. Thus, keys starting with the letters A-D might be in file1. keys starting with the letters E-H might be in file2. You need a file that will track such information.
Next, extracts the records keyed on file1, file2, etc., from the original file. Or just keep the index if you only need the ability to achieve sorted order.
As a follow through, make sure that future records are inserted in an orderly way, and integrated into your key index. Thus, you'll only have the 'big work' once.
Nevertheless, 100GB is just going to take time.
Let me know if this makes sense.
M
 
Helen Thomas
Ranch Hand
Posts: 1759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Max Habibi:

question, are you using JDK 1.4 here?
M

At the moment JDK 1.3.
JDK 1.4 would be better for the task though.
 
Helen Thomas
Ranch Hand
Posts: 1759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How is a radix sort different than a merge sort ?
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic