Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!

# Algorithm for searching for a scrambled word in a list

Garrett Rowe
Ranch Hand
Posts: 1296
I have a list stored as an ArrayList<String> of about 300,000 words. I want to be able to enter a scrambled String, and return all the words in the list that can be made from the random String.

ex:
"tac" returns:
cat
act

Currently my algorithm for doing this is very brute-force. and although it works, it is slow. In pseudocode it might look like this

Like I said its slow because each time I want to find a match I have to iterate through and sort every word in the list. Its not too bad if I only want to test one scrambled word, but I want to test the scrambled word and all substring permutations of the word.

"tac" tests "tac", "ta", "tc", "ac", "t", "a", "c" and returns
a
at
act
cat

Obvoiusly as the words get longer, the number of substring permutations gets longer (I believe its n^2 - 1 for a word of length n) I thought about using a HashMap<String, String[]> where the key would be letters in their natural ordering, and the value would be an array of Strings that that group of letters can produce:

key = act
value = {act, cat}

but I can't think of a reasonable way to populate the map. If anyone has any ideas or a better algorithm, I'd love to hear them.

Garrett
[ February 12, 2006: Message edited by: Garrett Rowe ]

Craig Tyler
Ranch Hand
Posts: 52
Having not tried it out, I'm really not sure if this would be any faster, but it might be worth a shot. Create a regex expression based on the scrambled string you're making. In the case of "tac", it would be something like ([tac])+ so at least the algorithm would be faster since you wouldn't have to copy strings from the list of words to a character array.

Garrett Rowe
Ranch Hand
Posts: 1296
Thanks for the suggestion. Now my algorithm is considerably faster since I added this check and a length check before I check the scrambled word versus the word in the list.

i.e.

This greatly reduces the number of words that are actually checked for equality which speeds up the algorithm signifigantly. I'll keep looking for ways to tweak it as I would like it to move even faster.
[ February 12, 2006: Message edited by: Garrett Rowe ]

Ernest Friedman-Hill
author and iconoclast
Marshal
Posts: 24211
35
The classic solution to this is to preprocess the dictionary. For each dictionary word, sort the letters in that word, and store the word in a Map, using the sorted chars as a String as a key, and using Lists as the values -- i.e., your map looks like

mnoo -> moon, mono
loot -> loot, tool
dgo -> dog, god

Once you've done this preprocessing, to solve for a particular word you just sort the characters, and then look up the answer in the Map (which is instantaneous.)

If the program needs to solve only one word, this is not worth the effort; but really, as soon as you need to do even a few in a single run, this is going to be the fastest way. And of course, you could store the processed dictionary and read it in at runtime.

Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
One of my favorites the Ternary Search Tree is a very fast way to see if a given word is in the dictionary. I can't think of a way to apply it to your problem right off, but maybe you'll see one.

Garrett Rowe
Ranch Hand
Posts: 1296
Thanks fro the responses everyone. I am currently using Ernest's suggestion and creating a dictionary file. Doing this truly exposes the weakness of my original algorithm. Its taking about 2 hours to write the dictionary file using my current algorithm.

Garrett Rowe
Ranch Hand
Posts: 1296
Well for any who are curious (which I'm sure is practically no one). The routine for writing the dictionary finally completed. I decided to cut some time and go with the 100,000 word list instead of the 300,000 word list. I still took about an hour and a half to sort the scrambled word dictionary. I wrote it out to a text file, and then wrote a parser to parse the information back into a map at runtime. The lookup for the words is practically instantanious. Now the only part of the routine that takes any signifigant time at all is my recursive method for extracting all the various substrings, which isn't too bad for words of length < 7, but slows down signifigantly after that. Thanks for all the suggestions.

Garrett