aspose file tools*
The moose likes Java in General and the fly likes Sort very large data without using database Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of JavaScript Promises Essentials this week in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Sort very large data without using database" Watch "Sort very large data without using database" New topic
Author

Sort very large data without using database

shalini raj
Greenhorn

Joined: Sep 09, 2006
Posts: 1
How to sort large data (say 100 million) integer numbers in a machine with 64 mb ram without using database
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24187
    
  34

Hi,

Welcome to JavaRanch!

The classic, most general solution is an N-way merge sort. Divide the data into chunks that fit in memory, and sort each one in-core, then write out individual sorted data files. Now open all the files, and read one number from each. Write the largest one to the final file, and replace it with a number from the same file. Repeat. Continue until all the files are empty.

But there are other things you can do if you know more about the data. For example, what if the numbers are all unique and have a known maximum number of digits (for example, a large collection of US phone numbers?) Then you can create an output array that contains just one bit to represent each possible number, and just run through the 100 million numbers on disk, setting the bit in the array corresponding to each number you find. Then you run through the bit array, writing the full number for each "on" bit to a final file. This last algorithm is linear in the size of the data -- quite a feat! The idea comes from Jon Bentley's "Programming Pearls", by the way -- a marvelous little book.


[Jess in Action][AskingGoodQuestions]
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
Originally posted by Ernest Friedman-Hill:
The idea comes from Jon Bentley's "Programming Pearls", by the way -- a marvelous little book.


Thanks for suggesting book.


My blood is tested +ve for Java.
Srinivas Kalvala
Ranch Hand

Joined: Oct 20, 2005
Posts: 257

Hello,

Always there will be a trade off between the memory and performance. Can you tell me, will we get this type of situation in real world. If we get this situation, then will we limit memory to 64mb and will think about good algorithm. probably no, and we have to go for tradeoffs.

More inputs are welcome.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
The classic, most general solution is an N-way merge sort. Divide the data into chunks that fit in memory, and sort each one in-core, then write out individual sorted data files. Now open all the files, and read one number from each. Write the largest one to the final file, and replace it with a number from the same file. Repeat. Continue until all the files are empty.


Brings back memories ... on 70's vintage mainframes we could see the tape drives write a few records to each tape, spin them all back, merge them together, and, as you say, repeat more or less forever.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Sort very large data without using database