I have got asked this question in 2 different interviews so far. I need the answer before going for the next interview. The only answer that comes to my mind is to compare the typed word against every word in the dictionary, to find the edit distance and suggest the words that have the least edit distance. Obviously this procedure is very inefficient and may take hours. I need a better answer.
Question: if you are checking to see if a word is a real word, do you check every word in the dictionary to see if it matches? Or do you try to look up the word, alphabetically to see if the word exists?
Originally posted by A Bhattacharya: How to identify a real word depends on what kind of datastructure you use for the dictionary.
True, but remember that it is a job interview. If you are not willing to make assumptions about the datastructure -- like the dictionary is ordered and searchable, then what does that say about you?
Are you the type that really believe that it is okay to have a dictionary which is just a random ordering of words? Are you the type that won't speak up, if the dictionary is just a random ordering of words?
okay to have a dictionary which is just a random ordering of words?
Henry
How else do you suppose, the human mind works
But yes, I do agree with henry, a spell check(software) would be against a alphabetically sorted word list. [ May 24, 2008: Message edited by: Devesh H Rao ]
Rambo Prasad
Ranch Hand
Joined: Feb 23, 2006
Posts: 628
posted
0
I think you can build a binary tree...I think something like a Trie should help...
Then do a binary search....
Helping hands are much better than the praying lips
Originally posted by A Bhattacharya: How to identify a real word depends on what kind of datastructure you use for the dictionary.
It does. And this is something you can ask the interviewer. When I give interviews, I leave out some information from my coding questions. Virtually all of the good candidates ask clarifying questions or ask whether they can assume something is a certain way. Sometimes they ask things I intentionally left out and sometimes they ask things I didn't even think about. And recently the twice I have forgotten to specify what version of Java we are using, the candidate asked. (since Java 1.4 and 5/6 are so different.)
In the real world, assignments don't always contain all the required information. It isn't a trick, it's just what happens in requirements documents. Being able to identify this information and ask for clarification is a valuable skill.
If someone asked me about a spell checker, I would state that I was assuming the data structure was laid out a certain way. (alphabetical makes sense, 26 alphabetical files - one for each letter - seems even better) Then I would go on to answer the actual question.
I'd answer that I would spend half an hour on researching existing open source spell checkers. Failing that, I would google for algorithms for spell checkers. Failing that, I'd ask the folks at JavaRanch.
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
A Bhattacharya
Ranch Hand
Joined: Oct 22, 2007
Posts: 125
posted
0
Rambo, REmember we need to give spelling suggestions.
Originally posted by Ilja Preuss: I'd answer that I would spend half an hour on researching existing open source spell checkers. Failing that, I would google for algorithms for spell checkers. Failing that, I'd ask the folks at JavaRanch.
I like that answer! Although I'd probably prompt the candidate to speculate about an algorithm anyway to see how they think.
Originally posted by A Bhattacharya: REmember we need to give spelling suggestions.
Which is easier to do in a sorted list. I imagine it is faster to look for all the entries with one or two letters different/missing/added than look at every single word in the dictionary.
I'd answer that I would spend half an hour on researching existing open source spell checkers. Failing that, I would google for algorithms for spell checkers. Failing that, I'd ask the folks at JavaRanch.
I would have said that as well. I was researching a SWING spell checker a while back and it was open source. Cant remember the name
I would go for a sorted list and search through it. Pretty much the same algorithm that you suggested. If you can assign ranks to the possible completed word the list would be more and more meaningful as time goes by. Adaptive auto complete.
Hashing can be a part of one of the better solutions.
Ilja Preuss
author
Sheriff
Joined: Jul 11, 2001
Posts: 14112
posted
0
Originally posted by Srikanth Basavaraju: Hashing can be a part of one of the better solutions.
I was thinking about something similar - assigning a single number to each word, so that the closer they are in spelling, the closer the numbers would be. Should be interesting.
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35247
7
posted
0
Originally posted by Ilja Preuss: I was thinking about something similar - assigning a single number to each word, so that the closer they are in spelling, the closer the numbers would be. Should be interesting.
It is a fascinating area. Two algorithms that might help are Double Metaphone and the Levenshtein/Damerau Edit Distance. The former creates a normalized presentation of a word, enabling you to compare two different words, while the latter calculates the "distance" between two words, i.e. how many edit operations they are apart. Both can be used to determine whether a particular word -that is known to be written correctly- might be a substitute for an unknown -possibly misspelt- word.
Like I mentioned in my first post, I told this 'edit distance' thing in my answer. Problem is, you have to find the edit distance of the given word from EVERY word in the dictionary and choose the one at least distance. It will not work.
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35247
7
posted
0
Yeah, you're probably better off to narrow down the range of possible matches using Metaphone (or Double Metaphone for certain non-English languages). Those values can be precomputed. Once you have a small selection of possible matches you might usefully compute the edit distance for just those few words. [ May 29, 2008: Message edited by: Ulf Dittmer ]
Ilja Preuss
author
Sheriff
Joined: Jul 11, 2001
Posts: 14112
posted
0
Originally posted by Ilja Preuss:
I was thinking about something similar - assigning a single number to each word, so that the closer they are in spelling, the closer the numbers would be. Should be interesting.
Thought a little bit more about it, and I guess that those numbers would need to be *really, really big* - probably *too* big to be a reasonable solution. It just would consume too much memory... (This is just my current gut feel, though.)