File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

ArrayList too slow?

 
Marilyn de Queiroz
Sheriff
Posts: 9059
12
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We are checking for duplicates as we iterate through a result set. The more objects we add to the ArrayList, the slower it goes (of course) because there are more objects to check against the current object. Is there a better way to handle this?
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24204
34
Chrome Eclipse IDE Mac OS X
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can use a Set instead of a List, because they know how to find (and eliminate) duplicates quickly. A LinkedHashSet would preserve the insertion order, if that's important; otherwise a HashSet would be the fastest.
 
Marilyn de Queiroz
Sheriff
Posts: 9059
12
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good idea, but ...
I'm keeping a list of the tickets I've already seen, and then I compare the new ticket to that list to see if it's a duplicate. If it's not a duplicate, I process it. If it is a duplicate, I ignore it. I think that changing the list to a set will probably not help in this instance because I'm not adding the ticket to a list (or set) and then processing the list afterwards. I'm processing each one as it comes and passing it on (or not) based on this isDupe() method.
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24204
34
Chrome Eclipse IDE Mac OS X
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I still think it's the right thing to do. Set.add() returns true if the element is new, or false if it's a duplicate, so there's exactly the information you need. But with a TreeSet, this takes ln(n) time instead of (n) time to search the list; and with a HashSet it takes amortized constant time.

I didn't understand your objection but maybe I'm just being dense.
 
Marilyn de Queiroz
Sheriff
Posts: 9059
12
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tis I who is dense.

I forgot add() returns a boolean. You're totally right on.

I'm not very strong on the ln(n) vs amortized stuff, but I'm thinking that TreeSet is the way to go in this instance.
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24204
34
Chrome Eclipse IDE Mac OS X
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Marilyn de Queiroz:

I'm not very strong on the ln(n) vs amortized stuff, but I'm thinking that TreeSet is the way to go in this instance.


The difference isn't huge. If searching through a million items in an ArrayList takes 1,000,000 time units, then searching in a TreeSet is going to take ~ 13 units. Searching a HashSet will take ~ 1 unit -- but it's probably a bigger unit, and every once in a while, an insertion will take a whole lot of units.

Going from a List to a Set is going to save a huge amount of time; which set you choose doesn't matter nearly so much.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic