Hi , thank you for reading my post. I want to make a dictionay for english/locale words. so i have too much lookup operation in my dictionary i was wondring is it possible to store words in XML format instead of in an RDBMS ? if so , for about 60,000 words is it time effective when we use xml as data storage? another question which i have is about searching mechanism in XML files , is there some package wich make it possible to have an efficence searching or not?
Overall my question is about choosing XML or RDBMS for storing a dictionary data . if XML what library should i use? if is there any embeded Database wich support Unicode for my porpose , if XML is not suitable .
Personally I would use an XML document for the data storage because it will be easy to edit with a text editor and easy to reformat for various purposes with XSLT and other XML tools. You would want to pull this document into an application and build a lookup structure optimized for speed and flexibility. The Java collections such as HashMap are handy for this.
Consider adding phonetic code lookup if there is any chance of users not always using the right spelling. I have an example of phonetic lookup from a 59,956 word dictionary here. Bill
Joined: Mar 30, 2005
Hi That is fantastic ,i mean the speed Its very fast ,do you used XML for its storage? How you search all of the words such fast ? u use in memory table ?or somethiing like that ?
can you pleas tell me more about , how you have done it ? does your database and algorithm are open for using in OSS projects ? I saw that you used some pice of JAkarta projects. what about other stuff ?
sorry for too much question.
Author and all-around good cowpoke
Joined: Mar 22, 2000
In that example, the words are read from a flat file when the servlet first starts, the phonetic code is computed and then used as a key for a hashtable. Since more than one word can give a particular code, the value stored in the table is an ArrayList holding references to the original Strings. The data structures stay in memory. Hashmap lookup is indeed very fast, much faster than a DB query. You are welcome to use the code however you want, but note that the Jakarta Commonas project has the CODEC collection of tools that includes the Metaphone algorithm and other useful goodies. Lawrence Philips wrote the original Metaphone implementation (in frustation with the Soundex algorithm I think.) Bill