wood burning stoves 2.0*
The moose likes Performance and the fly likes iterating over unchanging collections Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "iterating over unchanging collections" Watch "iterating over unchanging collections" New topic
Author

iterating over unchanging collections

Susan Hoover
Ranch Hand

Joined: Jan 04, 2001
Posts: 64
While profiling some critical code, after profiling and optimizing it a couple times already, I noticed that we were spending an awful lot of time in two methods: Iterator.hasNext and Iterator.next. A quick look-see at the code showed that we were doing the following in a lot of places:

Since Iterator.hasNext and Iterator.next both took an awful lot of time, it was desirable to see about getting rid of one or the other or both. Since the contents of the collection didn't change after we filled it, we figured we could eliminate the hasNext and use the collection size instead. The next attempt looked like this:

This was a lot faster, and should work with all collections. If the collection is a List, you can use this format:

Peter Tran
Bartender

Joined: Jan 02, 2001
Posts: 783
Susan,
Better yet, I would do the following:

Converting a collection to an array lookup speeds up access tremendously, because you get rid of the cast and the method invocation.
-Peter

[This message has been edited by Peter Tran (edited January 18, 2001).]
[This message has been edited by Peter Tran (edited January 18, 2001).]
Susan Hoover
Ranch Hand

Joined: Jan 04, 2001
Posts: 64
Peter T says:

Okay, I tried that on a quick piece of code, but the first time I did it, my program crashed on the toArray line. The only difference is that instead of
<pre>
MyObj[] arrayObj = (MyObj[])collection.toArray(new MyObj[0]);
</pre>
I was using the other toArray method:
<pre>
MyObj[] arrayObj = (MyObj[])collection.toArray();
</pre>
Both of them allocate new arrays. The difference is that toArray() allocates an array of Object references, whereas toArray(new MyObj[0]) allocates an array of MyObj references. Looking at that, I can see where the toArray() method would cause a problem, but I would have thought that I would get a ClassCastException instead of just blowing the whole VM up.
That is, if you start with a MyObj[] and pass it to a method that takes an Object[] and returns the same Object[], you can cast the return back to a MyObj[] because the references really are MyObjs. But if you start with a MyObj[] and pass it to a method that takes an Object[] and returns a new Object[], you can't cast the return to a MyObj[] because the references really aren't MyObjs.
Whew!
Anyway, thanks for the tip; toArray looks useful. How expensive is it? That is, if you're only going through the loop once, does it buy you any performance gains over Iterator.next() or List.get(i)? True, you aren't casting and you don't have the method invocation in each loop iteration, but on the other hand you're allocating this array and copying all of your references into it.
BTW, I would slightly amend your example above as follows (note the "for" termination condition), because I think, reading some of Peter Haggar's posts, that the compiler doesn't collapse the arrayObj.length and it will probably get executed every time:

--
Susan
Peter Tran
Bartender

Joined: Jan 02, 2001
Posts: 783
You are right about the the "Whew!" You said a mouth full, and the frightening thing is I actually understood what you were saying.
You asked a good question. Converting from a collection to an array does impose some overhead. As to how much overhead is not an easy question to answer.
These are the questions I ask myself before I convert a collection to an array implementation.
1. How many times will I iterate through the collection?
2. How big is the collection? You're really keeping two copies of the data in memory.
3. Does the collection contains other collection? Is it a list of list?
Peter's rule of thumb: If the collection isn't excessively large and complex, and if I iterate over the collection more than 2x than I convert to array; otherwise, I use some other method like your earlier example with an iterator().
The above rule is just what I go by, and shouldn't be followed blindly (even by me!) The best approach is to write the code and get it working first, and then do some profiling. Remember to profile your code after the change to see if you get any performance improvement.
As to your slight modification, (IMHO) it will help but not much. In Peter Haggar's earlier discussion, he was talking about the List.size(), which invokes a method to get the size of the list. With an array, the length is an instance variable of the array. Access to this public instance variable should be faster than invoking the method.
-Peter
[This message has been edited by Peter Tran (edited January 18, 2001).]
Susan Hoover
Ranch Hand

Joined: Jan 04, 2001
Posts: 64
Whoops, d'oh! Right, array.length is different from list.size().
Okay, one more question. Here's a real-life situation from a piece of code I have recently written (but which I am not averse to refactoring). I've altered the problem domain to protect the innocent.
I have two kinds of things, call them Nodes and Patterns, in my database. Patterns and Nodes are composed of several descriptive fields, each of which can have one or more aggregation levels (e.g. New York City - USA - World). For the sake of the example, say the fields are eye color, gender, and city. A node is a specific thing, like "blue-eyed woman in New York City" or "brown-eyed man in Atlanta". Patterns can be wildcarded in zero or more fields, such as "any eye color women in New York" or "blue-eyed people of either gender in the USA".
The object is, given a collection of nodes and a collection of patterns, return the collection of patterns that match each node. The patterns have integer IDs for easier reference. Something like this (although this is not really a method in itself--it falls within a larger method):
<pre>
int[][] getMatchingPatterns(java.util.List listNodes, java.util.List listPatterns);
</pre>
Now, it looks like I will have to do listNodes.size() * listPatterns.size() comparisons. So, here's what I've done:

So what you're saying is that I'm probably getting a lot of benefit out of the listPatterns.toArray, because I'm iterating through listPatterns numNodes times. But I'm probably not getting much benefit out of listNodes.toArray because I'm only iterating over listNodes once.
Is there a better way to do this?
--
Susan
Ajith Kallambella
Sheriff

Joined: Mar 17, 2000
Posts: 5782
How about using the List.get(int index) to extract just the one element that you need at the specific index?
ie., replacing

with

This way you dont have to create an array of Node using the toArray just to access one element.
Sounds good??
------------------
Ajith Kallambella M.
Sun Certified Programmer for the Java2 Platform.


Open Group Certified Distinguished IT Architect. Open Group Certified Master IT Architect. Sun Certified Architect (SCEA).
Ipsita Naravane
Greenhorn

Joined: Dec 27, 2000
Posts: 29
Wow! thanks for the iterator() tips. I have used it in my code. I will take a look at it too.
Thanks

Can anyone tell me if there is a similar issue with enumerations?
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Shilpa- yes, in general Enumerations are subject to the same problems as Iterators.
Re: array.length - not only is it an instance field, it's a final instance field. A halfway intelligent compiler can make use of this info to optimize this for you. JDK 1.3 does not, of course, but that's nothing new.
As for Susan's problem - I agree that get(i) is probably better for the outer loop, since you're only going through it once.
Susan - are you looking for patterns that match nodes, or nodes that match patterns? I suspect it's the latter, in which case
<code><pre> listMatches.add(pattern.getId());</pre></code>
should probably be
<code><pre> listMatches.add(node.getId());</pre></code>
(Of course, this was probably just a typo from the conversion from the real, confidential problem to an alternate problem.)
Also, I note that it looks like your code will list all nodes that match any one of the patterns - is that the intent? If what you're really looking for is nodes that match all of the patterns (since you can always wildcard them to be less restrictive), then you can speed things up substantially by breaking out of the inner loop (or continuing the outer loop) whenever a node does not match a pattern, rather than going through the rest of the patterns.


"I'm not back." - Bill Harding, Twister
Susan Hoover
Ranch Hand

Joined: Jan 04, 2001
Posts: 64
Jim,
Thanks for the tip re using get(i) on the outer loop. I'll try it and compare, once I get a large enough data set to profile.
That said, the problem is as I have stated it, finding all patterns that match the nodes passed in, not the other way around. What happens is that during configuration time, the users define a set of wildcarded patterns. Then, during run time, the caller passes in a specific node (or a bunch of them), and I have to figure out which pattern or patterns apply to each node. So, what I need to return is the pattern ID.
Ajith Kallambella
Sheriff

Joined: Mar 17, 2000
Posts: 5782
I just realized, the performance of the get(index) method depends on the index value, since many List implementations internally use iterators(?) to get to the element. So get(index) will take more time for larger index values.
I am also wondering, those (List implementations) that donot use internal iterators to get to the specific index might just be using an internal toArray() do access the element using an array subscript. Though this might improve performance in terms of latency, it is no better than your original code since the array is being created anyway.
Performance gurus- any thoughts on this one?
------------------
Ajith Kallambella M.
Sun Certified Programmer for the Java2 Platform.
Jack Shirazi
Author
Ranch Hand

Joined: Oct 26, 2000
Posts: 96
The main tip you all seem to have missed is reimplmenting the collections themselves to run the query inside the collection object. This gives you the fastest access to the elements.
I wrote a couple of articles on querying collections which illustrate this if that's any help, Optimizing a Query on a Collection and Optimize a Query on a Map.
--Jack http://www.JavaPerformanceTuning.com/
Susan Hoover
Ranch Hand

Joined: Jan 04, 2001
Posts: 64
Originally posted by Jack Shirazi:
The main tip you all seem to have missed is reimplmenting the collections themselves to run the query inside the collection object. This gives you the fastest access to the elements.
I wrote a couple of articles on querying collections which illustrate this if that's any help, Optimizing a Query on a Collection and Optimize a Query on a Map.
--Jack http://www.JavaPerformanceTuning.com/

Ooh, hey, definitely everybody should read Jack's articles, linked above. My apologies for not crediting Jack for his ideas. I read the JavaWorld article a couple months ago and remembered some bits (like the hasNext/next thing) but forgot others, and when it came time to use the tips I couldn't remember where I had seen them. (Now it's bookmarked )
Once again, my apologies for not crediting the originator of these ideas!
Now I'll go back and reread the articles and see whether there's a way I can customize the collections as Jack suggests...
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: iterating over unchanging collections
 
Similar Threads
FilteredIterator
Collections vs Collections
for(int i:objList) i don understand how this works?
Synchronized Collection Decorators
doubt with for-each loop