Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Agile forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Big performance hog with (Collection).removeAll ?

 
Cyril Gambis
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi!

We have a strange performance hog in our application, in a thread which wake up every minute.

The culprit seems to be removeAll on a collection, which takes forever. This collection has about 16000 entries, and we remove nearly every entry in the operation.


I demonstrate the problem with the code below. removeAll takes 15 seconds, even if a simple loop with a simple "remove" takes only 30 ms.

Can "removeAll" be so broken in our case? (the lists of entries we remove and the list of entries in which we remove are nearly the same).

What do you think?

Thanks!
Cyril


[ February 29, 2008: Message edited by: Cyril Grao ]
 
Nitesh Kant
Bartender
Posts: 1638
IntelliJ IDE Java MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Cyril,
Welcome to Javaranch.
(FYI: Code tags have square brackets and not angle brackets. More information here)

Yeah, your analysis is correct. The time taken by removeAll() is directly dependent on the size of the list to be removed. This is the code that it uses:



The code in bold is what removeAll() does in addition to what your simple removal does.
Since, the size of the passed collection is huge, the contains method call will take a long time and this time will be multiplied by the number of elements in the list that is being modified.

I suggest a few solutions (may not be the best possible):
  • If you are removing all the elements from the collection then clear() will be the best thing to do.
  • If you do not need the return value of the removeAll() operation, you can use the code that you have written i.e. do a simple iterate and call remove for all the items to be removed.

  •  
    Cyril Gambis
    Greenhorn
    Posts: 3
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks for you answer!

    Actually, I'm not really a newcomer of Java Ranch but I post rarely, and each time I forget my login and the mail account I used to register...

    Your analysis is correct, and c.contains is probably the culprit. I can't use clean(), since there may be 2 or 3 items different in the real scenario of my point, but I'll go with the solution with the loop and the simple "remove".

    Have a good day,

    Cyril
     
    Ernest Friedman-Hill
    author and iconoclast
    Marshal
    Pie
    Posts: 24208
    35
    Chrome Eclipse IDE Mac OS X
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Rather than the messy handcoded loop, you could use something like

    secondList.removeAll(new HashSet(firstList));

    List.contains() is a slow operation (it takes time proportional to the size of the list), but HashSet.contains() is very fast (its runtime is insensitive to the set size.)
     
    Cyril Gambis
    Greenhorn
    Posts: 3
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Good solution, thanks!

    The result is nearly as good as doing the remove by hand and the code is clearer.

    Results:

    ****** Starting tests ******
    >>> secondList.removeAll(firstList): 8266
    >>> thirdList.remove *16000 (firstList): 343
    >>> theSet.removeAll(firstList): 7625
    >>> theSet2.remove *16000 (firstList): 16
    >>> theSet3.removeAll handcoded(firstList): 7672
    >>> theSet4.removeAll(new HashSet(firstList)): 31

    Cheers,
    Cyril
    [ March 03, 2008: Message edited by: Cyril Gambis ]
     
    Stephane Clinckart
    Ranch Hand
    Posts: 89
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    The same code can be use for the removeAll on the List.



    Results:
    ****** Starting tests ******
    >>> list2.removeAll(firstList): 5717
    >>> list3.remove *16000 (firstList): 120
    >>> list4.removeAll(new HashSet<String>(firstList)): 151
    >>> theSet.removeAll(firstList): 5581
    >>> theSet2.remove *16000 (firstList): 3
    >>> theSet3.removeAll handcoded(firstList): 5609
    >>> theSet4.removeAll(new HashSet(firstList)): 5

    Remark: I optimise little bit more the stuff by removing the class Cast and removing creation of object in loops.
    [ March 11, 2008: Message edited by: Stephane Clinckart ]
     
    Stephane Clinckart
    Ranch Hand
    Posts: 89
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I could not remove the smiley... also by using the code tags :-/
    [ March 11, 2008: Message edited by: Stephane Clinckart ]
     
    Ilja Preuss
    author
    Sheriff
    Posts: 14112
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Originally posted by Stephane Clinckart:
    I could not remove the smiley... also by using the code tags :-/

    [ March 11, 2008: Message edited by: Stephane Clinckart ]


    Check the "Disable smilies in this post." checkbox under "options" in the edit screen...
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic