wood burning stoves 2.0*
The moose likes Performance and the fly likes Big performance hog with (Collection).removeAll ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Big performance hog with (Collection).removeAll ?" Watch "Big performance hog with (Collection).removeAll ?" New topic
Author

Big performance hog with (Collection).removeAll ?

Cyril Gambis
Greenhorn

Joined: Feb 29, 2008
Posts: 3
Hi!

We have a strange performance hog in our application, in a thread which wake up every minute.

The culprit seems to be removeAll on a collection, which takes forever. This collection has about 16000 entries, and we remove nearly every entry in the operation.


I demonstrate the problem with the code below. removeAll takes 15 seconds, even if a simple loop with a simple "remove" takes only 30 ms.

Can "removeAll" be so broken in our case? (the lists of entries we remove and the list of entries in which we remove are nearly the same).

What do you think?

Thanks!
Cyril


[ February 29, 2008: Message edited by: Cyril Grao ]
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

Hi Cyril,
Welcome to Javaranch.
(FYI: Code tags have square brackets and not angle brackets. More information here)

Yeah, your analysis is correct. The time taken by removeAll() is directly dependent on the size of the list to be removed. This is the code that it uses:



The code in bold is what removeAll() does in addition to what your simple removal does.
Since, the size of the passed collection is huge, the contains method call will take a long time and this time will be multiplied by the number of elements in the list that is being modified.

I suggest a few solutions (may not be the best possible):
  • If you are removing all the elements from the collection then clear() will be the best thing to do.
  • If you do not need the return value of the removeAll() operation, you can use the code that you have written i.e. do a simple iterate and call remove for all the items to be removed.


  • apigee, a better way to API!
    Cyril Gambis
    Greenhorn

    Joined: Feb 29, 2008
    Posts: 3
    Thanks for you answer!

    Actually, I'm not really a newcomer of Java Ranch but I post rarely, and each time I forget my login and the mail account I used to register...

    Your analysis is correct, and c.contains is probably the culprit. I can't use clean(), since there may be 2 or 3 items different in the real scenario of my point, but I'll go with the solution with the loop and the simple "remove".

    Have a good day,

    Cyril
    Ernest Friedman-Hill
    author and iconoclast
    Marshal

    Joined: Jul 08, 2003
    Posts: 24183
        
      34

    Rather than the messy handcoded loop, you could use something like

    secondList.removeAll(new HashSet(firstList));

    List.contains() is a slow operation (it takes time proportional to the size of the list), but HashSet.contains() is very fast (its runtime is insensitive to the set size.)


    [Jess in Action][AskingGoodQuestions]
    Cyril Gambis
    Greenhorn

    Joined: Feb 29, 2008
    Posts: 3
    Good solution, thanks!

    The result is nearly as good as doing the remove by hand and the code is clearer.

    Results:

    ****** Starting tests ******
    >>> secondList.removeAll(firstList): 8266
    >>> thirdList.remove *16000 (firstList): 343
    >>> theSet.removeAll(firstList): 7625
    >>> theSet2.remove *16000 (firstList): 16
    >>> theSet3.removeAll handcoded(firstList): 7672
    >>> theSet4.removeAll(new HashSet(firstList)): 31

    Cheers,
    Cyril
    [ March 03, 2008: Message edited by: Cyril Gambis ]
    Stephane Clinckart
    Ranch Hand

    Joined: Oct 21, 2003
    Posts: 89
    The same code can be use for the removeAll on the List.



    Results:
    ****** Starting tests ******
    >>> list2.removeAll(firstList): 5717
    >>> list3.remove *16000 (firstList): 120
    >>> list4.removeAll(new HashSet<String>(firstList)): 151
    >>> theSet.removeAll(firstList): 5581
    >>> theSet2.remove *16000 (firstList): 3
    >>> theSet3.removeAll handcoded(firstList): 5609
    >>> theSet4.removeAll(new HashSet(firstList)): 5

    Remark: I optimise little bit more the stuff by removing the class Cast and removing creation of object in loops.
    [ March 11, 2008: Message edited by: Stephane Clinckart ]
    Stephane Clinckart
    Ranch Hand

    Joined: Oct 21, 2003
    Posts: 89
    I could not remove the smiley... also by using the code tags :-/
    [ March 11, 2008: Message edited by: Stephane Clinckart ]
    Ilja Preuss
    author
    Sheriff

    Joined: Jul 11, 2001
    Posts: 14112
    Originally posted by Stephane Clinckart:
    I could not remove the smiley... also by using the code tags :-/

    [ March 11, 2008: Message edited by: Stephane Clinckart ]


    Check the "Disable smilies in this post." checkbox under "options" in the edit screen...


    The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: Big performance hog with (Collection).removeAll ?
     
    Similar Threads
    Why doesn't Object type come out of a non-generic list?
    Strings in a List
    Comparing two ArrayList
    Why this program is not printing any values???
    Regarding TypeSafe Iterator