wood burning stoves 2.0*
The moose likes Performance and the fly likes When to use Iterator? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "When to use Iterator?" Watch "When to use Iterator?" New topic
Author

When to use Iterator?

Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 114

Hello!:

I've a very simple question, at least by its formulation (maybe the answer is more complex). Whenever I need to return a set of variables from a function, I consider if such variables may be returned one by one, not for random access, while they're being generated or obtained (I think that some people call it "streamed" return) or I must return them as a whole. In the first case I always prefer to return an Iterator whereas in the second one I return an array or a List. Basically I base my decision on the possible needs of the receiver code.

For example, to read lines from a text file or to find matches of a string into another one, I use the first solution. But if I need to return a set of column names which will be used to guess the index of each one, I use the second one.

Why do I use Iterator if possible?


  • Because only one loop will be necessary to obtain the values and use them. Usually I would need at least two: one to build the returned set of values and one more to use it in the receiver's code.
  • Because I may re-utilize the instance returned by the method "next ()" without making a new one. For example I can re-fill an array with new values without making a new instance on every iteration.
  • Because it saves memory, since you don't need to return a bunch of data.


  • I'm afraid of taking this approach too far and make my code too complex. I'd love knowing if you apply this kind of approach (or a similar one) and also if you agree or don't with it and why. Thank you a lot.
    Martin Vajsar
    Sheriff

    Joined: Aug 22, 2010
    Posts: 3611
        
      60

    In general, I don't see any problem in returning an iterator from a method if it is all the caller of that method needs. Indeed, if you wanted eg. to process a large file line by line, reading the whole file into memory might be prohibitive and an iterator might be an elegant solution.

    However, I see potential problem with your second point:

    Because I may re-utilize the instance returned by the method "next ()" without making a new one. For example I can re-fill an array with new values without making a new instance on every iteration.

    I generally would not expect that an object I got from an iterator might be changed as a result of call to the next() method. The contract for Iterator does not mention such possibility and unless this is very thoroughly documented, someone might one day get serious headaches from it.

    Moreover, modern garbage collectors are pretty efficient when collecting short-lived instances, so the cost to create and immediately drop an object in every iteration should be relatively small. You've already made your share of optimization by not hoarding all the instances in memory at once.

    First and third points seem perfectly valid to me. I'm not used to this approach, but don't see a problem with it. If only the for statement would support iterators as neatly as the for each variety
    Avor Nadal
    Ranch Hand

    Joined: Sep 15, 2010
    Posts: 114

    Martin Vajsar: Thank you for your opinion . I also have to disagree about the second point, in spite of being "mine", he he he. I implemented that mechanism not much time ago, but with certain scepticism because of the reasons which you cite. Someone could store the returned objects into an array, for example, and later discover that they all are the same but not different ones. I, myself, have been victim of my own "trap" sometimes, indeed. I decided to warn about this fact in the API documentation, but I've to admit that re-using an already returned internal object is a very weird (bad) behaviour.

    I'm sure that this obsession to re-use objects comes from having read about the use of StringBuilder instead of String into for/while loops. Apart of that case, as a general rule, I've been told many times to try to avoid creating many instances inside of a loop (although I'm sure, as you also state, that modern JVMs take much care of that matters). And that's why I took this approach to such an extreme point. But, obviously, it's not the same to re-use a variable created in the same block of code (right) that one which is (supposed to be) created into another function (wrong).

    So I'm going to take your advice just now, fix that behaviour of my Iterators and remove the comments from the documentation. Thank you again for you help .
    Amit Ghorpade
    Bartender

    Joined: Jun 06, 2007
    Posts: 2716
        
        6

    Please check your private messages


    SCJP, SCWCD.
    |Asking Good Questions|
    Martin Vajsar
    Sheriff

    Joined: Aug 22, 2010
    Posts: 3611
        
      60

    Igor Nadal wrote:I'm sure that this obsession to re-use objects comes from having read about the use of StringBuilder instead of String into for/while loops.

    I must admit I too tend to overoptimize String concatenation operations a little bit.

    However, there is a difference between the Iterator example and a String/StringBuilder. If you allocate (and let go) 100 bytes in each loop, it will be 100 bytes even after thousands of iterations. But if you append 100 bytes worth of text in every loop, after thousand iterations you'll copy 100K of data at every step. StringBuilder does copy the data too, but it allocates new memory in progressively larger chunks, therefore diminishing the relative overhead with increasing number of iterations.

    Additionally, modern JVMs actually explicitly advice against pooling objects, as it is detrimental to the normal work of the garbage collector. As everywhere, we need to keep pace with the technology changes.
    Wouter Oet
    Saloon Keeper

    Joined: Oct 25, 2008
    Posts: 2700

    Igor Nadal wrote:[...]
  • Because it saves memory, since you don't need to return a bunch of data.

  • [...]
    Why do you think that is? I think that it doesn't save memory. The data needs to be stored somewhere in the memory and whether you're using an Iterator or a reference to it won't make that much of a difference (maybe there is even a small overhead for the Iterator instance).

    And I see another advantage: you're hiding your datastructure if you're returning an Iterator. That way you can change your datastructure without having to modify the rest of your code.


    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
    Please correct my English.
    Mike Simmons
    Ranch Hand

    Joined: Mar 05, 2008
    Posts: 3018
        
      10
    Wouter Oet wrote:
    Igor Nadal wrote:[...]
  • Because it saves memory, since you don't need to return a bunch of data.

  • [...]
    Why do you think that is? I think that it doesn't save memory. The data needs to be stored somewhere in the memory and whether you're using an Iterator or a reference to it won't make that much of a difference (maybe there is even a small overhead for the Iterator instance).

    I disagree - I think it certainly can save memory, though the effect isn't always significant. In particular, while the total amount of memory allocated to objects might be the same in each case, using Iterator allows you to avoid having everything in memory at the same time. Martin already gave the example of reading a large file. Using a List or other Collection means you need to have all the data from that file in memory at once. Using Iterator means you can just return one row at a time, or one record at a time. There's a good chance that much of the data may be available for GC shortly after it's read. (This depends on what you're doing with the data, of course, but at least it's possible.) And modern GC will often be more effective with short-lived objects than with long-lived ones.

    There is another advantage to using an Iterator: it allows client code to start processing results as soon as any data is available, without necessarily waiting for all the results to be available. This technique is commonly employed when processing a JDBC ResultSet for example, though the API is a bit different (it predates Iterator) and we don't normally think about whether the database has actually finished delivering all our results - we can just start working with them.

    However, I do disagree with Igor and Martin on one point: I would almost never return an Iterator from a method, but rather an Iterable. This allows it to be easily used by client code in a modern for loop, which is by far the most common use case I see for any collection-like thing I encounter. It also generally makes it easier to swap between other Collection classes in your code - a List or Set can often be directly replaced with an Iterable, while replacing with an Iterator can require more changes. To your for loops, if nothing else.
    Pat Farrell
    Rancher

    Joined: Aug 11, 2007
    Posts: 4659
        
        5

    If you look at the very good Google Guava's "collections" package (free, open source, etc.) you will see that they often return iterators from the functions that deal with filtering, transforming, and otherwise manipulating their various Set, List, Map, etc. collections. They also strongly encourage the use of Immutable collections, which nice ways to build a "ImmutableList" or "ImmutableSet"

    They also include a number of partition methods that operate on their collections. When you partition an Immutable collection, then you can safely process it in parallel, which is a big win on modern multi-core systems.
    Mike Simmons
    Ranch Hand

    Joined: Mar 05, 2008
    Posts: 3018
        
      10
    I was a bit surprised to see that Guava's IO code does not appear to use Iterators or Iterable in many places I would have expected them to. For example com.google.common.io.Files has a readLines() that returns a List<String> rather than Iterable<String>. However they have other methods that achieve many of the same benefits Iterable would provide, using the LineProcessor interface. Too bad this is a bit more cumbersome to use than an Iterable would be. And since the interface has two methods, it won't benefit from SAM conversion once (if?) Java finally gets lambda expressions in Java 8. Oh well. Seems like it's destined to remain a bit cumbersome to use, even if it is pretty efficient.
    Pat Farrell
    Rancher

    Joined: Aug 11, 2007
    Posts: 4659
        
        5

    Mike Simmons wrote:I was a bit surprised to see that Guava's IO code does not appear to use Iterators or Iterable in many places I would have expected them to. .... Seems like it's destined to remain a bit cumbersome to use, even if it is pretty efficient.

    Its clear to me that Guava is a bundling of a bunch of different Java libraries were developed by different teams within Google. If you look at the Collections package, they are all designed from a common playbook. But those in other packages, such as the IO that you point out, are clearly from some other group. I expect the re-engineering effort was too much even for Google to afford.
    Avor Nadal
    Ranch Hand

    Joined: Sep 15, 2010
    Posts: 114

    Martin Vajsar: Ops, you're right. I shouldn't have put that example, because they're not equivalent.

    About the last advice, I've been warned many times too in the old official Sun forums (maybe because my questions talk about re-using objects very often, he he he). However, I'm not sure if I'm doing a dangerous object pooling or even if it can be considered like that. I only keep objects to be re-used for 2 very basic cases. First, when I create classes which need certain attributes (typical). And second, in while/for loops to avoid instantiating certain kind of classes too much, but restricting the scope of the variable as much as I can. How? Dividing the code in several private methods or functions (I love doing this to isolate parts) or applying code blocks (using a pair of brackets {} ). I'd like to know if you consider this a bad practice.

    Thank you again for all the time which you're dedicating to solve my doubts ;) .

    Wouter Oet: Mike Simmons explained it better than I had done, he he he.

    Mike Simmons: I thought about using the Iterable interface too, but finally I desisted. However, I'll take a look to it again and see if it's worth it for my cases ;) .

    Pat Farrell: Interesting, interesting. Thanks for the information.
    Martin Vajsar
    Sheriff

    Joined: Aug 22, 2010
    Posts: 3611
        
      60

    Igor,

    I'd say it pretty much depends. I've found garbage collectors to be so effective that I stopped worrying about creating a new instance here and there. If in doubt, use a profiler to see whether there are problems.

    I can offer just one related experience. In our project we sometimes read lots of data from database and store it in an array of doubles (the primitive type). There can be several hundred thousands of them. By mistake I've used JDBC function that returned Double instead of double. By using the proper function I've avoided creating all these Doubles and the loading time dropped significantly. However, there was no further processing associated with the data, just storing them in an array. Of course, this is an extreme case. The more processing is associated with an instance in a single loop, the lower the percentage of resources consumed allocating and freeing the instance itself. Anything more complicated than a simple storage will probably cause the allocation/collection overhead to be negligible.

    About limiting variable scope: I'm not at all sure that instance referenced by a variable that has fallen out of scope inside a method (out of curly braces) is eligible for garbage collection, I'd say it will be collected only after the methods ends (this is just academic question if you keep your methods small). I limit scope of variables for better readability of the code, not to help garbage collector, I'd consider the latter to be a premature optimization.

    As for returning Iterable: it's nice idea, until now I was usually returning a Collection. Using it for "streamed" data has one minor disadvantage though: I'd expect that I can iterate over an Iterable instance more than once. Again, this limitation would have to be thoroughly documented, especially if the instance you'd return might be eventually used a few layers away from your method.
    Mike Simmons
    Ranch Hand

    Joined: Mar 05, 2008
    Posts: 3018
        
      10
    Martin Vajsar wrote:As for returning Iterable: it's nice idea, until now I was usually returning a Collection. Using it for "streamed" data has one minor disadvantage though: I'd expect that I can iterate over an Iterable instance more than once. Again, this limitation would have to be thoroughly documented, especially if the instance you'd return might be eventually used a few layers away from your method.

    I would simply make sure that the iterator() method takes care of re-initializing the source of the data. If we're reading from a file, create a new FileReader when Iterator is called, and use that for the new iteration.

    Actually this raises an issue with both Iterable and Iterator, which may be why Guava's IO classes don't return them: there's no reliable way to close the resource. You can close a FileReader when the Iterator gets to the end, when hasNext() returns false or next() returns null. But what if the user has errored out or otherwise exited the loop before it's done? Normally we'd want to put reader.close() in a finally block - but with the Iterator/Iterable approaches, we don't have access to the reader. What do you call close() on?

    With Guava IO's choice to either return a List<String> or pass in a LineProcessor, they can completely control both opening and closing the FileReader within the readLines() method, ensuring that it does get closed. The user can't forget. This sort of approach can also work for any language that accepts closures (or even Java 8's wimpy proposed lambdas), but not with Iterator or Iterable. Yeah, the more I think about it, the more I'm convinced this is why Guava chose the path it did, and I think it's a pretty good reason.
    Mike Simmons
    Ranch Hand

    Joined: Mar 05, 2008
    Posts: 3018
        
      10
    On the subject of re-using the object in an Iterator: it don't like it much. Seems error-prone, easily misunderstood by the next programmer who comes along. All it takes is one person deciding they're rather put everything in a List for convenient random access (a reasonable thing to do if the list isn't too long), and then they get confused wondering why all the elements in the list are identical to the last entry.

    And when you swap the data inside the object, it's easy to get bugs when you add a new field, forgetting to transfer the new field, or forgetting to clear it out properly. One element might get data from the previous element. To me that sort of bug is more pernicious than accidentally leaving the field null (which is what would probably happen if you're creating a new object each time) because (a) it's less obvious, as the field looks like it's been populated (with the wrong data) and you risk sending private information to the wrong party. If I'm iterating through a collection of a hospital's patient records, I want each one to get a completely new instance, and minimize the chance that patient B's medical history might be sent to patient A's address just because some programmer forgot to overwrite A's address properly when switching records.

    And yes, the overhad for object creation and collection can be pretty small these days. Especially if we don't keep objects around for an unnecessarily long time.
    Avor Nadal
    Ranch Hand

    Joined: Sep 15, 2010
    Posts: 114

    Martin Vajsar: I'll note down all your advices. Thank you a lot for your time. It's very appreciated.

    Mike Simmons: I faced that problem which you comment several weeks ago indeed. I made my own interface for these cases... But re-doing my personal libraries I forgot why I had needed such interface and decided to replace the returned class by one which implemented Iterator, in order to make it more portable. So now, I'm hitting my head on the desk, ha ha ha. So I'm turning back to the old approach.
    PS: I'm talking about closing an input stream.
    Avor Nadal
    Ranch Hand

    Joined: Sep 15, 2010
    Posts: 114

    Mike Simmons: My past message was about the problem of closing an input stream. Just to make sure that it's not misunderstood.

    About your last advice, definitively I must reconsider my strategy of re-utilising objects. Really I hadn't thought about such an extreme situation, but it's possible to happen if I miss something, absolutely.
    Mike Simmons
    Ranch Hand

    Joined: Mar 05, 2008
    Posts: 3018
        
      10
    Igor Nadal wrote:Mike Simmons: I faced that problem which you comment several weeks ago indeed. I made my own interface for these cases... But re-doing my personal libraries I forgot why I had needed such interface and decided to replace the returned class by one which implemented Iterator, in order to make it more portable. So now, I'm hitting my head on the desk, ha ha ha. So I'm turning back to the old approach.

    Yeah, but even with a custom interface that extends Iterator and adds a close() method - will the user of the code remember to call the close method, and will they use a finally block as they should? Guava IO's approach ensure's it's not something they need to worry about, ever.
    Mike Simmons
    Ranch Hand

    Joined: Mar 05, 2008
    Posts: 3018
        
      10
    Igor Nadal wrote:but it's possible to happen if I miss something, absolutely.

    And if you have any co-workers, it's much more possible if someone else ever modifies or uses the code you wrote, without fully understanding it.

    Sadly for me, "someone else" can easily include myself, a month from now. I can easily return to my old code and not remember all the details I need to make it work just right. Especially if those details are subtle and different that what people "normally" do. So it's worth the effort to make my code as idiot-proof as possible, because I may well be one of those idiots I'm guarding against.
    Avor Nadal
    Ranch Hand

    Joined: Sep 15, 2010
    Posts: 114

    Mike Simmons: I absolutely agree with your last sentence. Because everyday I demonstrate to myself that I've a fish memory (sad but true XD). Tomorrow I'll continue applying your last advices to my most recent code. Thank you too.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: When to use Iterator?