Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Deleting duplicate numbers from a .csv file.

 
Shreedhar Naik
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I have generated 1 crore random numbers using java's SecureRandom and stored all those numbers in a .csv file.
As mu next step i noticed that in the .csv file i have lots of duplicate numbers.

So now i need to delete all duplicate numbers from that file. Please any one help me.

OR just let me know if you are aware of any other methods to generate unique 1 crore random numbers.
 
Ulf Dittmer
Rancher
Pie
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Lots of duplicates doesn't necessarily mean it's not random, although it certainly sounds suspicious. How, exactly, are you generating the numbers?

How are you storing them in a file, meaning, what makes the output a CSV file? If there's a single number on each line then it's not really a CSV.
 
Shreedhar Naik
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
Thanks for your response. First I am generating the random number and storing the same into a csv file each line of file will have 254 random numbers with ',' (except the last number of each line). I need to open the same file in Microsoft Excel and it only 256 columns due to that only i am doing like this. and the code which i have written for the same is as below;



-Thanks
Shree
 
Campbell Ritchie
Sheriff
Posts: 48363
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried to work out the chances of your never having any duplicates, and I may have got it wrong, but it was too small a number to display on my calculator. It simply showed "0". That was assuming 2^32 possibilities for SecureRandom#next() which returns an int, not a long.
 
Leander Kirstein-Heine
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I won't verify if there are some duplicates. ;) But try to store your numbers in a set and leave the loop if the size of the set tells you having as much numbers as you want.

Just my 2 cents ..
 
Campbell Ritchie
Sheriff
Posts: 48363
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Leander Kirstein-Heine wrote:I won't verify if there are some duplicates. ;) But try to store your numbers in a set and leave the loop if the size of the set tells you having as much numbers as you want.

Just my 2 cents ..
Agree. That has already been suggested here.
 
Leander Kirstein-Heine
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:Agree. That has already been suggested here.


Right and sorry, I'm really new here and haven't read all threads ...
 
Mike Simmons
Ranch Hand
Posts: 3028
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I replied in Shree's previous thread, because that thread seemed to have more info about what I believe the main difficulty here is - ensuring that the numbers are unique. Trying to delete duplicates after the fact is still going to require some way of detecting duplicates. And for ten million numbers, this may be nontrivial. The problem here is comparable to the one in the original post, so I figured as long as it has to be solved, it's better to eliminate duplicates before they are written to the file.

Having said that though, I note that the code above has a simple bug which ensures that the number at the end of each line is duplicated at the beginning of the next line. Removing that bug may be enough to generate files that look, to the casual eye, like they have no duplicates. If you need to ensure this, well, see the other thread for more discussion.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic