File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Big Collection, integer key Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Big Collection, integer key" Watch "Big Collection, integer key" New topic

Big Collection, integer key

Siamak Saarmann
Ranch Hand

Joined: Aug 21, 2004
Posts: 78

I have a very big unordered collection (10,000-50,000) and I need a very fast search method using their int id. Is a hashtable (with Integer key) the best method to use? Or there is a better alternative?

Thanks, Mac
[ November 06, 2008: Message edited by: Siamak Saarmann ]

PhD Artificial Intelligence, OCJP1.6
Campbell Ritchie

Joined: Oct 13, 2005
Posts: 46337
Why not use a linear search? It shouldn't take more than a few milliseconds. 5 digits is by no means a large collection.
Dmitri Bichko

Joined: Jun 16, 2007
Posts: 15
Ok, the question is "I need a very fast search method" and the first answer is "linear search"? That's... interesting.

Yes, a hash lookup is one of the fastest ways to do this (without knowing more about the specific problem). There are also some faster (and more memory efficient) implementations out there; for example, Colt:
arulk pillai
Ranch Hand

Joined: May 31, 2007
Posts: 3371
This is not really a big collection and you can experiment it with

-- int pos = Collections.binarySearch(list, key); check,%20T)

-- HashMap or HashTable (key is id Object, value is an object)

-- Hashset (value is object, but your equals and hashCode method should only use id attribute. It will break if you use additional attributes). This is a hack way and not recommended.

[] | [Java job hunting know how] | [Java resumes]
Campbell Ritchie

Joined: Oct 13, 2005
Posts: 46337
But linear search at O(n) will be faster than sorting at O(nlogn).
Ar Arulk Pilai says, experiment.
fred rosenberger
lowercase baba

Joined: Oct 02, 2003
Posts: 11955

is the collection going to grow at some point? will it eventually become a million or a billion or more elements? How often are elements added or deleted?

Also, how often do you need to do this? if you are going to search for an element millions of times, you may do better to take the performance hit and sort it once (high initial cost) so that all the later searches are quick (low subsequent cost).

However, if you are only going to search for a dozen or so items, it's not worth sorting.

How is the collection built? is it rebuilt on every run? can you build it once, sort it, and save that for subsequent use?

How you answer each of these questions can make a difference in what you do. There is seldom a 'best' way to do something in software design. You have to weigh all the variables and decide what is most important... and then go from there.

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Siamak Saarmann
Ranch Hand

Joined: Aug 21, 2004
Posts: 78
Hello again,

Actually the collection is the list of agents in a simulation software. In each time step each agent will perform its actions, however it needs to look at several other agents (at least 5) to be able to decide. So I need 50,000 * 5 searches for each time step.

Considering that I need to simulate hundred thousands of time steps I will need 100,000 * 50,000 * 5 searches in each simulation.

@Dmitri : I already use colt for random numbers, thank you for mentioning the capability of Colt. I never looked at other features of it.

@campbell and arulk: thanks for suggestions. I guess experimenting will give me the best choice.

@fred : Agents come and go (added and removed) and the size of the collection is not fixed (a very dynamic system is being simulated).

Regards, Mac
Siamak Saarmann
Ranch Hand

Joined: Aug 21, 2004
Posts: 78
Just wanted to give a feedback on this.

For the simplicity I used HashTable with String keys (integer agent id's are converted to string and used as key) and even though this is not a professional work (I should rather use Integer keys) I was able to get from 8 to 80 times (depending on the number of agents in my collection) speedup.

It means a simulation which takes 80 seconds with linear search will only need 10 seconds or less now.

I was thinking if I work on the performance (in every section of the 20,000-30,000 lines of code software) I can make huge difference.
David O'Meara

Joined: Mar 06, 2001
Posts: 13459

I wrote this on HashMap behaviour a while back and should contain the data you want. It uses String keys rather than Integers, but given immutable keys with cached hash values, the insert, rehash and read times are pretty darn good.
I agree. Here's the link:
subject: Big Collection, integer key
It's not a secret anymore!