Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Initial Capacity of Collections

 
Ricardo Cortes
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm running the Analyze Code feature of IntelliJ on my classes and one of the suggestions it gave me was to add initial capacities to my collections (ArrayLists, HashMaps, etc). Page 101 of "Java Performance Tuning" states the following:

Vector grows by creating a new larger internal array object, copying all the elements from the old array, and discarding it...presizing a collection to its largest potential size reduces the number of objects discarded

So, what happens if I don't know what the size of the collection will be? Perhaps I'm creating an ArrayList from a datbase call. Should I initialize the collection to some arbitraryily large number or should I simply use an empty ArrayList. Any input on this subject would be great as I'm attempting to come up with a performance methodology for my company.
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I presize Collections and StringBuffers in two cases. The first is the obvious one: when you know exactly how many elements it will hold. The second is when I expect a "large" number of elements.

Resizing an ArrayList once or twice isn't going to cost you that much, but if you're executing a query that you know for certain (by analyzing the data or knowing the domain well) will return at least 300 elements, on average 500, and at most 1000, you may as well presize it at 500 at least to avoid ten resizes. ArrayList grows by half each resize rather than Vector's doubling.

The key to remember is that wasting 500 empty reference slots is only 2k. On today's hardware that doesn't even count as lint. I would prefer that to 10 extra allocations and 10 + 15 + 23 + 35 + 53 + 80 + ... + ~500 unnecessary copying of references.

As my programs rarely require Maps larger than 10 or 20 elements, I've never bothered to presize those. Be sure that you really understand how HashMap works, enough to be able to choose values that will provide good distribution. If not, you might do more harm than good.
 
Ricardo Cortes
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Very good information...thanks!
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I didn't yet care about it, and as far as I can tell wasn't hurt by it yet - the performance bottlenecks always proved to be elsewhere.
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree with Ilja. With few exceptions perfromance bottlenecks in my apps tend to be IO (database, network, file). I read a recent posting where someone put 1,000,000 entries into a HashSet in about .25 seconds.

Monitoring the actual performance of your code is the way to go. Don't spend much time tuning until you have measurements. People that tune first usually tune code that makes no substantive difference to application performance. For example you could write a few lines of code that tested the impact of various initial sizes of ArrayList and see how/if they affect performance.
[ May 01, 2005: Message edited by: steve souza ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic