We're thinking of using HashMap to get rid of duplicate records having the same key value. So we shove everything into a large Map--it will probably be somewhere between 100,000 and 10 million records per run after removing duplicates. The key will be around 20 characters, and the value will be around 300 characters, both String objects. (rough guestimates)
So, after processing all the records, and removing dupes we'd like to write all the values out to a file. Don't need to sort. There seems to be a few options on how to do this with HashMap:
1) Getting a Set of keys, iterating the through the keys and pulling each value.
2) Getting a Set of values, iterating through those.
3) Getting a Collection of values.
Anyone have a good understanding of the consequences of each choice above in terms of memory usage and performance?
If all of these prove to be too slow and require too much memory, then we'll have to look for another data structure or write a custom one.
Looking for some advice/guidance on this. Anyone have an opinion on the best way to handle this problem?