• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Big Collection, integer key

 
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

I have a very big unordered collection (10,000-50,000) and I need a very fast search method using their int id. Is a hashtable (with Integer key) the best method to use? Or there is a better alternative?

Thanks, Mac
[ November 06, 2008: Message edited by: Siamak Saarmann ]
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Why not use a linear search? It shouldn't take more than a few milliseconds. 5 digits is by no means a large collection.
 
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok, the question is "I need a very fast search method" and the first answer is "linear search"? That's... interesting.

Yes, a hash lookup is one of the fastest ways to do this (without knowing more about the specific problem). There are also some faster (and more memory efficient) implementations out there; for example, Colt: http://acs.lbl.gov/~hoschek/colt/api/cern/colt/map/package-summary.html#package_description
 
Author
Posts: 3473
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is not really a big collection and you can experiment it with


-- int pos = Collections.binarySearch(list, key); check http://java.sun.com/j2se/1.5.0/docs/api/java/util/Collections.html#binarySearch(java.util.List,%20T)

-- HashMap or HashTable (key is id Object, value is an object)

-- Hashset (value is object, but your equals and hashCode method should only use id attribute. It will break if you use additional attributes). This is a hack way and not recommended.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
But linear search at O(n) will be faster than sorting at O(nlogn).
Ar Arulk Pilai says, experiment.
 
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
is the collection going to grow at some point? will it eventually become a million or a billion or more elements? How often are elements added or deleted?

Also, how often do you need to do this? if you are going to search for an element millions of times, you may do better to take the performance hit and sort it once (high initial cost) so that all the later searches are quick (low subsequent cost).

However, if you are only going to search for a dozen or so items, it's not worth sorting.

How is the collection built? is it rebuilt on every run? can you build it once, sort it, and save that for subsequent use?

How you answer each of these questions can make a difference in what you do. There is seldom a 'best' way to do something in software design. You have to weigh all the variables and decide what is most important... and then go from there.
 
Siamak Saarmann
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello again,

Actually the collection is the list of agents in a simulation software. In each time step each agent will perform its actions, however it needs to look at several other agents (at least 5) to be able to decide. So I need 50,000 * 5 searches for each time step.

Considering that I need to simulate hundred thousands of time steps I will need 100,000 * 50,000 * 5 searches in each simulation.

@Dmitri : I already use colt for random numbers, thank you for mentioning the capability of Colt. I never looked at other features of it.

@campbell and arulk: thanks for suggestions. I guess experimenting will give me the best choice.

@fred : Agents come and go (added and removed) and the size of the collection is not fixed (a very dynamic system is being simulated).

Regards, Mac
 
Siamak Saarmann
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just wanted to give a feedback on this.

For the simplicity I used HashTable with String keys (integer agent id's are converted to string and used as key) and even though this is not a professional work (I should rather use Integer keys) I was able to get from 8 to 80 times (depending on the number of agents in my collection) speedup.

It means a simulation which takes 80 seconds with linear search will only need 10 seconds or less now.

I was thinking if I work on the performance (in every section of the 20,000-30,000 lines of code software) I can make huge difference.
 
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I wrote this on HashMap behaviour a while back and should contain the data you want. It uses String keys rather than Integers, but given immutable keys with cached hash values, the insert, rehash and read times are pretty darn good.
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic