File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes HashSet Performance Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "HashSet Performance" Watch "HashSet Performance" New topic
Author

HashSet Performance

Drew Lane
Ranch Hand

Joined: May 13, 2001
Posts: 296
Is HashSet generally known to be the fastest collection available in J2SE?
I've setup a HashSet without about 1000 different numbers in it which I am just using to do a comparison to see if the number is in the collection.
This works pretty well, but is it going to slow down significantly as the set gets larger?
Is there a better way to do this?
Also, I'm storing the numbers in the HashSet as Strings since this collection requires Objects. Would another object type be better for performance?
Thanks,
Drew
Jeff Langr
author
Ranch Hand

Joined: May 14, 2003
Posts: 762
You might try numeric wrappers, e.g. Integer. Your experiments will tell you whether it's faster or not.
If your number distribution is "tight," using an array of booleans would be much faster.
If using hash tables, the best thing you can do is pay careful attention to the capacity and load. The hashCode itself is significant, but since you're using either String or a numeric wrapper, it's already taken care of and is probably fine (but might not be depending on your data distribution).
You're best off making the capacity a prime number, since a slot is determined by hashCode % capacity. Otherwise, if you use a number with a lot of factors, you have the potential for a high number of collisions (when two objects hash to the same slot in the table). Collisions are the most significant cause of degraded hash table performance. See this page for more information.
If you know how many elements you'll need to hold, pre-allocate the hash table so that you have about a 75% load (the default load factor, the point at which the table must grow). This will avoid the need to grow the table. Making it too large can adversely impact iteration.
-Jeff-
[ March 02, 2004: Message edited by: Jeff Langr ]

Books: Agile Java, Modern C++ Programming with TDD, Essential Java Style, Agile in a Flash. Contributor, Clean Code.
Drew Lane
Ranch Hand

Joined: May 13, 2001
Posts: 296
Looks like I'm in a bit of a tight spot if I want to follow the rules.
Since I have 1000 numbers, according the the link the closest primes are:
1031
2053
I think 1031 would be too small, and maybe 2053 too big (wasteful?)
Should I just use the no arg constructor and let the API do it's thing?
Drew
Jeff Langr
author
Ranch Hand

Joined: May 14, 2003
Posts: 762
Try something from here, perhaps around 1300: http://www.utm.edu/research/primes/lists/small/1000.txt
In any case, experiment. Run a timing test using the no-arg constructor and see if it makes a difference.
In fact, it probably doesn't matter; 1000 numbers is a pretty small amount to insert. Don't sweat it until it's a problem, and be confident knowing that you have the knowledge to quickly improve upon the performance if necessary.
-Jeff-
 
Don't get me started about those stupid light bulbs.
 
subject: HashSet Performance
 
Similar Threads
Reading from a Collection
How can I read all objects in a file with ObjectInputStream?
Buckets and Hashes
Passed With 141/155 (~91%)
Fast removal from lists