I am working on a problem where i have to find all the possible anagram pairs from the given file input. I have written a code for the same but it is working very slow.
Any idea what i am doing wrong:
Note: In another forum here, i have not pasted this part and dint ask same question. In this
i have asked question on "isAnagram()" method , i used in line 50.
When you say its working very slowly, do you mean it returns a result but takes along time or it keeps running and never returns a result?
If its the latter then I think I see a bug in the while loop in isAnagram(). If the string is matched with an anagram you remove it from the set, but if its not you leave it in the set. This means that if any word doesn't have a matching anagram you will pull that same word from the set on the next loop and check it again. This will go on forever.
If you meant the former then you will need to profile the code using a profiler and find the bottleneck.
Thanks Mike, Actually, it was case 2. Result was not coming even after long time.
I figured it out now, issue was with the while loop condition. I changed it and it worked fine now. Updated is as below:
But even after this, it slow. For example, for around 3000 i/p it took 9 sec but for others it was mentioned that for 10,000 input it took 2 sec. Any thought of improvement.
For performance improvements you may need to redesign your approach. Given that an anagram has the same letters but not in the same order, you can make a "key" for each word by making a string of the same letters where the letters are sorted in alphabetical order. Then you create a HashMap based on those keys. The HashMap would then contain a HashSet instead of a List so that no word is entered twice. In this way you can avoid the multiple comparisons of each word. Then, you'd have an anagram when the Set contained two or more words. This would also avoid the need for a sorted set.