• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

fast list/map??

 
Ranch Hand
Posts: 187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi,

following problem:
i have a textfile which contains about 20.000 md5sums and some more informations
and i have a directory which contains files called like those md5sums

i created a class which contains the md5sum and some further informations (retrieved from the txt)

i know need to check if every file in the directory is in the textfile, and if so, store the further informations about that file in the class (thats why i need a extra class)

whats the fastest way to do this? when i use a normal linkedlist i need about 30min
i even loaded the whole file into ram which reduces the time (for searching etc.) to about 15min

 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Checking for an item in a list is quite slow, because you have to search one-by-one through the list. Finding something in a Java Set, though, is quite fast: TreeSet and especially HashSet lookups take very little time even for huge sets. Read your list of file names and put them into a HashSet, and then check whether the HashSet contains each name -- you should find this to be enormously faster.
 
olze oli
Ranch Hand
Posts: 187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
what you said i just read on another website but thats really confusing me
how can it be faster to put some thousands objects in a hashsetand iterate over it until the matching one is found?
i thought using a list with binarySearch should be the fastest as possible way to get this

edit: when i use hashset how can i get an object?
 
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Don't iterate over it - just use the contains() method. HashSet is written to optimise this sort of thing.

Another option would be to use a HashMap, mapping the MD5 sum to the object you want to store. That'll give you very fast lookup based on the MD5 sum - much faster than iterating over a list. And that has the advantage of giving you access to the object.
 
olze oli
Ranch Hand
Posts: 187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
that sounds pretty good
i will try this tomorrow

thanks!
 
Ernest Friedman-Hill
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A TreeSet would be about as fast as a binary search on a list (as long as you only have to sort the list once!) But a HashSet is even faster. You can read on Wikipedia how hash tables work and why they are fast.

I'm afraid I don't understand your last question; can you rephrase it?

Also, I'm a little concerned -- I can't imagine how 20,000 binary searches could take 15 minutes or more; there must be something else going on.
 
olze oli
Ranch Hand
Posts: 187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
its working pretty fast now with a hashmap where the md5 is the key and the value is the object i want to retrieve
thanks
 
reply
    Bookmark Topic Watch Topic
  • New Topic