File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Deleting duplicate numbers from a .csv file. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Deleting duplicate numbers from a .csv file." Watch "Deleting duplicate numbers from a .csv file." New topic
Author

Deleting duplicate numbers from a .csv file.

Shreedhar Naik
Greenhorn

Joined: Aug 23, 2007
Posts: 7
Hi All,

I have generated 1 crore random numbers using java's SecureRandom and stored all those numbers in a .csv file.
As mu next step i noticed that in the .csv file i have lots of duplicate numbers.

So now i need to delete all duplicate numbers from that file. Please any one help me.

OR just let me know if you are aware of any other methods to generate unique 1 crore random numbers.


Shree
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42600
    
  65
Lots of duplicates doesn't necessarily mean it's not random, although it certainly sounds suspicious. How, exactly, are you generating the numbers?

How are you storing them in a file, meaning, what makes the output a CSV file? If there's a single number on each line then it's not really a CSV.


Ping & DNS - my free Android networking tools app
Shreedhar Naik
Greenhorn

Joined: Aug 23, 2007
Posts: 7
Hi,
Thanks for your response. First I am generating the random number and storing the same into a csv file each line of file will have 254 random numbers with ',' (except the last number of each line). I need to open the same file in Microsoft Excel and it only 256 columns due to that only i am doing like this. and the code which i have written for the same is as below;



-Thanks
Shree
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39791
    
  28
I tried to work out the chances of your never having any duplicates, and I may have got it wrong, but it was too small a number to display on my calculator. It simply showed "0". That was assuming 2^32 possibilities for SecureRandom#next() which returns an int, not a long.
Leander Kirstein-Heine
Greenhorn

Joined: Mar 13, 2009
Posts: 4
I won't verify if there are some duplicates. ;) But try to store your numbers in a set and leave the loop if the size of the set tells you having as much numbers as you want.

Just my 2 cents ..
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39791
    
  28
Leander Kirstein-Heine wrote:I won't verify if there are some duplicates. ;) But try to store your numbers in a set and leave the loop if the size of the set tells you having as much numbers as you want.

Just my 2 cents ..
Agree. That has already been suggested here.
Leander Kirstein-Heine
Greenhorn

Joined: Mar 13, 2009
Posts: 4
Campbell Ritchie wrote:Agree. That has already been suggested here.


Right and sorry, I'm really new here and haven't read all threads ...
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
I replied in Shree's previous thread, because that thread seemed to have more info about what I believe the main difficulty here is - ensuring that the numbers are unique. Trying to delete duplicates after the fact is still going to require some way of detecting duplicates. And for ten million numbers, this may be nontrivial. The problem here is comparable to the one in the original post, so I figured as long as it has to be solved, it's better to eliminate duplicates before they are written to the file.

Having said that though, I note that the code above has a simple bug which ensures that the number at the end of each line is duplicated at the beginning of the next line. Removing that bug may be enough to generate files that look, to the casual eye, like they have no duplicates. If you need to ensure this, well, see the other thread for more discussion.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Deleting duplicate numbers from a .csv file.