This week's book giveaways are in the Refactoring and Agile forums.
We're giving away four copies each of Re-engineering Legacy Software and Docker in Action and have the authors on-line!
See this thread and this one for details.
Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

best performance ?

 
Edward Chen
Ranch Hand
Posts: 798
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Assume I have a txt file containing ten millions phone number, record unsorted and duplicate, Now I want

1. list top 20 duplicate phone numbers
2. sorted it
3. list duplicate frequency, like one phone number has 200 duplicate.

Which way has best performance? Database is not in the option list.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1. devise a way to turn the text of a phone number into a Java primitive, probably a long,
2. scan list adding the derived longs to a long[] array
3. sort the array

the remainder should be obvious.

Bill
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If it is in a file already you may not need to use java. You could also consider using unix utilities to sort and check for dupes.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic