Shawn Smith wrote:Those only return the potential insertion point for a new object. They don't tell you whether an object exists in the collection. I can use that to "look left and right", but I'm really just looking for a true or false.
SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6 - OCEJPAD 6
How To Ask Questions How To Answer Questions
Rob Spoor wrote:Is't n log(n) < n?
Shawn Smith wrote:
So my conclusions were (and feel free to throw rocks), there's really not a enough of a significant difference in the timed tests between HashSet and binarySearch for something as simple as determining String containment.
Stephan van Hulst wrote:Yeah, but that's still assuming the Strings are already sorted. If you're fair, you should add the time it takes to do the sorting to the time it takes binarySearch() to find an element.
There is a very small subset, and once the data is loaded I do simple iteration:
Jayesh A Lalwani wrote:Did you use a HashMap or a HashSet? Just askin
Another thing to consider when evaluating HashSet is the number of collisions in your strings that need to be searched. For a truly random set, the search time on a HashSet should be O(1). However, if all of your input data collides (very unlikely), the performance can degrade to O(n)
Jayesh A Lalwani wrote:How long does split take?
Shawn Smith wrote:
Jayesh A Lalwani wrote:How long does split take?
I can measure it, but is should be constant across both tests.
Shawn Smith wrote:I can certainly add the sort to the timing, though that's only going to affect the first iteration.
Essentially this is a dirty word checker. There are 622 dirty words and I ran the entire content of my /usr/share/dict/words file against them (479830) words.
Jeff Verdegan wrote:
Shawn Smith wrote:Now it certainly may be the case that the difference doesn't show up until we get to extremely large search sets, and that in itself is worth knowing.
Jeff Verdegan wrote:No, that's just a constant, and its significance shrinks as we do more searches, which is what matters for this performance comparison. Comparing a single binary search to a single hash search will not give consistent results and is a meaningless comparison.