Not sure whether this is the correct Forum to post this Question, anyway,
I need to implement the "Did you mean" functionality programatically using some java api. basically, i need to integrate this in to a search engine that we have developed, and say if the user enters some misspelt search word, then we must be able to say "Did you mean XXXXX" , where XXXXX is the corrected word, and then search results relating to the corrected word. hope i am clear,
One approach that comes to mind is to maintain a dictionary of words and search phrases that you want to cover. Whenever a search turns up few results, check the dictionary whether there is a word or phrase that is close the original search phrase (e.g. using the weighted Levenshtein distance as a measure of "closeness"). If that word/phrase has a lot more hits than the original one, offer it as a "did you mean..." alternative.
Years ago I coded a phonetic lookup tool to help a legal transcription service resolve different spellings of witness, etc names. There were lots of variations since the text came from court reporters listening to witnesses.
Metaphone is cool, Bill. I never got past Soundex which was invented in 1918. I wonder what language they were programming in back then. That WikiPedia page had a link to other Phonetic Algorithms of interest.
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Author and all-around good cowpoke
Joined: Mar 22, 2000
Metaphone is indeed cool - Lawrence Philips identified a real need when he came up with it.
I had an astonishing amount of interest when I first put that demo up - including many from folks having to match other language pronuciation - which would require completely new code of course. I have often wondered how many managed to make it work in their chosen language.
I have often wondered how many managed to make it work in their chosen language.
I just noticed that the DoubleMetaphone algorithm -which has been part of Commons Codec for a while- has provisions to detect and adapt to Slavo-Germanic languages. I'll definitely be looking into that, since I have a current need for something that works with German phrases.