aspose file tools*
The moose likes Java in General and the fly likes Implementation of addAll in ArrayList and LinkedList. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Implementation of addAll in ArrayList and LinkedList." Watch "Implementation of addAll in ArrayList and LinkedList." New topic
Author

Implementation of addAll in ArrayList and LinkedList.

Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

Hello:

Maybe it's because it's at night, I'm tired and I'm missing something, but checking the source code of the method addAll in the classes ArrayList and LinkedList I've seen something curious. I'm speaking about the implementation in JDK 7 specifically.

The first step that both classes do is creating an array from the passed Collection, calling its corresponding method toArray, regardless of the object's class. And this is just what I don't understand. Why don't these classes first check if the passed Collection is of the same class to try to take advantage of the equality of the internal structure?

For example, ArrayList could take the field "elementData" from the Collection (once it's upcasted to ArrayList) and copy the elements using System.arraycopy, without making an extra copy with toArray, Right?

And LinkedList could traverse the nodes of the Collection (upcasted to LinkedList) directly, without the need of creating a temporary array, Right?

Then, Why not? What am I missing? I discard that it's to improve the concurrency, because these classes are not thread-safe.

Thank you.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19696
    
  20

Avor Nadal wrote:Why don't these classes first check if the passed Collection is of the same class to try to take advantage of the equality of the internal structure?

Because the argument to addAll is limited; it's addAll(Collection<? extends E> c), so unless you explicitly try to fool the compiler you can't even pass anything that isn't compatible.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7814
    
  21

Avor Nadal wrote:The first step that both classes do is creating an array from the passed Collection, calling its corresponding method toArray, regardless of the object's class. And this is just what I don't understand. Why don't these classes first check if the passed Collection is of the same class to try to take advantage of the equality of the internal structure?

Because that sort of logic would require "dispatch" code, which you generally want to avoid in an 'objective' environment.

As Rob says, the definition takes care of the type of Collection being passed, so you can be pretty darn sure that all objects in any resulting array will be castable to the correct type.

In addition, a few other possibilities for you:
1. It's generally pretty trivial to convert a Collection to an array; and don't forget that the toArray() method will be tailored to the implementation, and so will probably be the fastest possible way to get a copy of its contents.
2. The resulting array is not coupled to the original Collection, which minimizes the possibility of disruption by any concurrent update. What you basically get is a "snapshot".
3. Arrays are basically the fastest "bulk structure" around. And also the most compact.
4. In the case of a bulk constructor, once you have an array, you may well be able to use it directly to create your new object (for example, if it's an ArrayList).

HIH

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

First of all, thank you both for your answers.

I agree that multi-dispatch methods are ugly. But does not the JDK use this strategy in other methods of other classes too? StringBuilder and DecimalFormat among other classes do it. However, in this case a multi-dispatch would be even cleaner than in that ones, because the class only would need to detect itself, not other classes. So there would not be any dependency.

There is also the possibility to overload the method addAll in these classes. They could do something like this in ArrayList:




Or in LinkedList:




Of course this would require that the declaration of the source's variable were done using the same class that the destination's or a subclass of it, not a superclass or an interface. Or upcast it later (if possible). Otherwise the call would be done to the other version of the method addAll.

I don't understand the need of creating a copy or snapshot of the source for concurrency either. ArrayList and LinkedList are not thread-safe by definition. So why should they worry about concurrency? In my opinion it would mean compromising simplicity (and efficiency, very probably) because of a too specific assumption, Right?

Salutes.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Avor Nadal wrote:There is also the possibility to overload the method addAll in these classes. They could do something like this in ArrayList:




I don't see what difference that would make. The documentation promises that the entries from the Collection (in this case, the ArrayList) will be added in order of the ArrayList's iterator. Were you thinking of just copying the data from the underlying array of the ArrayList and bypassing the iterator? That wouldn't be right because I could pass some subclass of ArrayList whose iterator returned the entries in a nonstandard order.
Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

Good point. I included the subclasses in my analysis later, because I didn't think about the consequences. Then let's go back to my original idea, consisting in checking for the same, exact, class only (at runtime). Why not to take advantage of such strategy? As I mentioned multi-dispatch is used in other parts of the JDK and these aren't thread-safe classes.

PS: I meant something like this, inside of LinkedList:



In the case of ArrayList, it would do something adequate to its array-like nature. In my opinion, copying the elementData field directly... Maybe am I ignoring something that doesn't allow it?
Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

I'm sorry, but I didn't illustrate the code properly. In ArrayList the original method looks like this:



My suggested implementation is this:



Because ArrayList is not thread-safe, the resulting array created with the method toArray would be as much inconsistent as copying the foreign array "elementData" to the local array "elementData". Am I missing something? In fact, the method toArray also uses System.arraycopy to copy the data.

In the case of LinkedList my change would consist in running through the nodes directly, instead of creating a temporary array and looping through its elements. Looping over an array may be faster, of course, but the construction of such array already implies running through the nodes of the LinkedList first!

My point is that both classes can detect if the passed Collection is of the same type, and take advantage of knowing the internal structure and, more important, accessing it without restrictions. And all this can be done without creating nasty dependencies towards other classes.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7814
    
  21

Avor Nadal wrote:My suggested implementation is this...

And it includes dispatch code, which is a huge red flag to an OO programmer. There may be cases where it's warranted, but I don't think this is one of them. You're (maybe) talking about the difference between O(n) and O(2n) time which, in 'O' notation, is non-existent (they're both O(n)). If you were talking about a quadratic difference then I might see the point, but even then, there are more generic things like RandomAccess you could probably check for.

The fact is that objects and classes are meant to prevent the need for dispatch code, so if you find yourself writing it, chances are you're doing it wrong.

And one last thing: dispatch code, no matter how wonderful, can only deal with types that exist. What if someone decides to create a class that you'd just love to cater for, but you can't because you fell under a bus before it was written?

Dispatch code is naff. It just is. It may seem great, and even be great in a language like C; but in Java.... if you need it, you didn't do your job properly.

Winston
Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

Winston Gutkowski: Thank you for your answer. My concerns were more related to the memory consumption than to the computation cost. One example is when you've to fill an ArrayList with millions of elements, but you have to do it calling several private methods. If every method creates a temporary ArrayList as result, it must be added to the main ArrayList later, using the method addAll (). Couldn't it be an issue to duplicate such big arrays? Or is it one of that extreme cases in which is OK to pass the ArrayList to the methods and mutate it inside of them?

Aside of all this, you all say that multi-dispatch is bad practice. And I agree, absolutely, although it doesn't look like that, he he he. But it disconcerts me to see this strategy in several places of the JDK, which in my opinion should be an example of good practices.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7814
    
  21

Avor Nadal wrote:Winston Gutkowski: Thank you for your answer. My concerns were more related to the memory consumption than to the computation cost. One example is when you've to fill an ArrayList with millions of elements, but you have to do it calling several private methods.

Why would there be several? If the method is private, then there can only be one - the one defined by the class you're using. If you're referring to calling that method several times, then any array (or whatever) that it creates to do the job will be eligible for garbage collection as soon as it exits.

TBH, I think you're obsessing about something that doesn't matter. If you're creating ArrayLists with millions of elements, you should probably be asking yourself why, rather than worrying about the efficiency of methods you have no control over. Programs are not databases.

Looking at the source code of Java objects can sometimes be useful to find out how other people solve problems, but you shouldn't assume that that's how it always looked - or will look in the next release. If using toArray() was a major problem, believe me, someone would have ripped out the guts and re-written it long before now. That's the whole point of keeping things hidden.

Also: Modern computers have VAST amounts of memory. My 8-year old clunker Dell has 3 Gigabytes, so it's unlikely to be troubled by a few extra copies of a million-element List.

Aside of all this, you all say that multi-dispatch is bad practice. And I agree, absolutely, although it doesn't look like that, he he he. But it disconcerts me to see this strategy in several places of the JDK, which in my opinion should be an example of good practices.

It'd disconcert me much more to see tailored code that uses dispatch logic.

Winston
Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

Winston Gutkowski wrote:TBH, I think you're obsessing about something that doesn't matter. If you're creating ArrayLists with millions of elements, you should probably be asking yourself why, rather than worrying about the efficiency of methods you have no control over. Programs are not databases.


My client wants a module (in the broadest sense of the word) to retrieve lots of data from a database, which requires to perform 3 different queries (hence the sentence "several private methods"), do some calculations with such data, and then write all this to a file. Because such calculations are not performed on a per-element basis and require analysing previous and following elements too, I was using a temporary Collection to store everything in memory. But I'm going to study this case again and try to divide the task in small steps (as small as the logic of the calculation allows me), pushing the results to the file each time. This should reduce the ArrayList size that I need.

Winston Gutkowski wrote:It'd disconcert me much more to see tailored code that uses dispatch logic.


What does "tailored code" means in this context?
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7814
    
  21

Avor Nadal wrote:But I'm going to study this case again and try to divide the task in small steps (as small as the logic of the calculation allows me), pushing the results to the file each time. This should reduce the ArrayList size that I need.

TBH, the file I/O is likely to dwarf any "efficiency" that you might save when dealing with ArrayLists (or copies/arrays thereof).

My advice: Do it the way that seems simplest until you know (and can PROVE) that the way you're doing it is "inefficient".

What does "tailored code" means in this context?

Winston
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Implementation of addAll in ArrayList and LinkedList.