wood burning stoves 2.0*
The moose likes Java in General and the fly likes Algorithm for searching for a scrambled word in a list Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Algorithm for searching for a scrambled word in a list" Watch "Algorithm for searching for a scrambled word in a list" New topic
Author

Algorithm for searching for a scrambled word in a list

Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
I have a list stored as an ArrayList<String> of about 300,000 words. I want to be able to enter a scrambled String, and return all the words in the list that can be made from the random String.

ex:
"tac" returns:
cat
act


Currently my algorithm for doing this is very brute-force. and although it works, it is slow. In pseudocode it might look like this


Like I said its slow because each time I want to find a match I have to iterate through and sort every word in the list. Its not too bad if I only want to test one scrambled word, but I want to test the scrambled word and all substring permutations of the word.

"tac" tests "tac", "ta", "tc", "ac", "t", "a", "c" and returns
a
at
act
cat

Obvoiusly as the words get longer, the number of substring permutations gets longer (I believe its n^2 - 1 for a word of length n) I thought about using a HashMap<String, String[]> where the key would be letters in their natural ordering, and the value would be an array of Strings that that group of letters can produce:

key = act
value = {act, cat}

but I can't think of a reasonable way to populate the map. If anyone has any ideas or a better algorithm, I'd love to hear them.

Garrett
[ February 12, 2006: Message edited by: Garrett Rowe ]

Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peter
Craig Tyler
Ranch Hand

Joined: Jan 15, 2006
Posts: 52
Having not tried it out, I'm really not sure if this would be any faster, but it might be worth a shot. Create a regex expression based on the scrambled string you're making. In the case of "tac", it would be something like ([tac])+ so at least the algorithm would be faster since you wouldn't have to copy strings from the list of words to a character array.
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
Thanks for the suggestion. Now my algorithm is considerably faster since I added this check and a length check before I check the scrambled word versus the word in the list.

i.e.

This greatly reduces the number of words that are actually checked for equality which speeds up the algorithm signifigantly. I'll keep looking for ways to tweak it as I would like it to move even faster.
[ February 12, 2006: Message edited by: Garrett Rowe ]
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

The classic solution to this is to preprocess the dictionary. For each dictionary word, sort the letters in that word, and store the word in a Map, using the sorted chars as a String as a key, and using Lists as the values -- i.e., your map looks like

mnoo -> moon, mono
loot -> loot, tool
dgo -> dog, god

Once you've done this preprocessing, to solve for a particular word you just sort the characters, and then look up the answer in the Map (which is instantaneous.)

If the program needs to solve only one word, this is not worth the effort; but really, as soon as you need to do even a few in a single run, this is going to be the fastest way. And of course, you could store the processed dictionary and read it in at runtime.


[Jess in Action][AskingGoodQuestions]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
One of my favorites the Ternary Search Tree is a very fast way to see if a given word is in the dictionary. I can't think of a way to apply it to your problem right off, but maybe you'll see one.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
Thanks fro the responses everyone. I am currently using Ernest's suggestion and creating a dictionary file. Doing this truly exposes the weakness of my original algorithm. Its taking about 2 hours to write the dictionary file using my current algorithm.
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
Well for any who are curious (which I'm sure is practically no one). The routine for writing the dictionary finally completed. I decided to cut some time and go with the 100,000 word list instead of the 300,000 word list. I still took about an hour and a half to sort the scrambled word dictionary. I wrote it out to a text file, and then wrote a parser to parse the information back into a map at runtime. The lookup for the words is practically instantanious. Now the only part of the routine that takes any signifigant time at all is my recursive method for extracting all the various substrings, which isn't too bad for words of length < 7, but slows down signifigantly after that. Thanks for all the suggestions.

Garrett
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Algorithm for searching for a scrambled word in a list
 
Similar Threads
Performance when serching a large Map
Looking for a leg up on an Anagram algorithm
Permutations
Finding anagrams in a dictionary file using pre-fixes
Scrambling Words