File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Performance and the fly likes Big performance hog with (Collection).removeAll ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Customer Requirements for Developers this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Big performance hog with (Collection).removeAll ?" Watch "Big performance hog with (Collection).removeAll ?" New topic

Big performance hog with (Collection).removeAll ?

Cyril Gambis

Joined: Feb 29, 2008
Posts: 3

We have a strange performance hog in our application, in a thread which wake up every minute.

The culprit seems to be removeAll on a collection, which takes forever. This collection has about 16000 entries, and we remove nearly every entry in the operation.

I demonstrate the problem with the code below. removeAll takes 15 seconds, even if a simple loop with a simple "remove" takes only 30 ms.

Can "removeAll" be so broken in our case? (the lists of entries we remove and the list of entries in which we remove are nearly the same).

What do you think?


[ February 29, 2008: Message edited by: Cyril Grao ]
Nitesh Kant

Joined: Feb 25, 2007
Posts: 1638

Hi Cyril,
Welcome to Javaranch.
(FYI: Code tags have square brackets and not angle brackets. More information here)

Yeah, your analysis is correct. The time taken by removeAll() is directly dependent on the size of the list to be removed. This is the code that it uses:

The code in bold is what removeAll() does in addition to what your simple removal does.
Since, the size of the passed collection is huge, the contains method call will take a long time and this time will be multiplied by the number of elements in the list that is being modified.

I suggest a few solutions (may not be the best possible):
  • If you are removing all the elements from the collection then clear() will be the best thing to do.
  • If you do not need the return value of the removeAll() operation, you can use the code that you have written i.e. do a simple iterate and call remove for all the items to be removed.

  • apigee, a better way to API!
    Cyril Gambis

    Joined: Feb 29, 2008
    Posts: 3
    Thanks for you answer!

    Actually, I'm not really a newcomer of Java Ranch but I post rarely, and each time I forget my login and the mail account I used to register...

    Your analysis is correct, and c.contains is probably the culprit. I can't use clean(), since there may be 2 or 3 items different in the real scenario of my point, but I'll go with the solution with the loop and the simple "remove".

    Have a good day,

    Ernest Friedman-Hill
    author and iconoclast

    Joined: Jul 08, 2003
    Posts: 24189

    Rather than the messy handcoded loop, you could use something like

    secondList.removeAll(new HashSet(firstList));

    List.contains() is a slow operation (it takes time proportional to the size of the list), but HashSet.contains() is very fast (its runtime is insensitive to the set size.)

    [Jess in Action][AskingGoodQuestions]
    Cyril Gambis

    Joined: Feb 29, 2008
    Posts: 3
    Good solution, thanks!

    The result is nearly as good as doing the remove by hand and the code is clearer.


    ****** Starting tests ******
    >>> secondList.removeAll(firstList): 8266
    >>> thirdList.remove *16000 (firstList): 343
    >>> theSet.removeAll(firstList): 7625
    >>> theSet2.remove *16000 (firstList): 16
    >>> theSet3.removeAll handcoded(firstList): 7672
    >>> theSet4.removeAll(new HashSet(firstList)): 31

    [ March 03, 2008: Message edited by: Cyril Gambis ]
    Stephane Clinckart
    Ranch Hand

    Joined: Oct 21, 2003
    Posts: 89
    The same code can be use for the removeAll on the List.

    ****** Starting tests ******
    >>> list2.removeAll(firstList): 5717
    >>> list3.remove *16000 (firstList): 120
    >>> list4.removeAll(new HashSet<String>(firstList)): 151
    >>> theSet.removeAll(firstList): 5581
    >>> theSet2.remove *16000 (firstList): 3
    >>> theSet3.removeAll handcoded(firstList): 5609
    >>> theSet4.removeAll(new HashSet(firstList)): 5

    Remark: I optimise little bit more the stuff by removing the class Cast and removing creation of object in loops.
    [ March 11, 2008: Message edited by: Stephane Clinckart ]
    Stephane Clinckart
    Ranch Hand

    Joined: Oct 21, 2003
    Posts: 89
    I could not remove the smiley... also by using the code tags :-/
    [ March 11, 2008: Message edited by: Stephane Clinckart ]
    Ilja Preuss

    Joined: Jul 11, 2001
    Posts: 14112
    Originally posted by Stephane Clinckart:
    I could not remove the smiley... also by using the code tags :-/

    [ March 11, 2008: Message edited by: Stephane Clinckart ]

    Check the "Disable smilies in this post." checkbox under "options" in the edit screen...

    The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
    I agree. Here's the link:
    subject: Big performance hog with (Collection).removeAll ?
    It's not a secret anymore!