aspose file tools*
The moose likes Java in General and the fly likes reading info from english dictionary Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "reading info from english dictionary" Watch "reading info from english dictionary" New topic
Author

reading info from english dictionary

Shrinath M Aithal
Ranch Hand

Joined: May 20, 2009
Posts: 82
hi all,

here is what i am trying to do :
Trying to achieve Natural Language Processing in java.
To do that, the first step is to be able to classify words into their respective parts of speech. To do that, I need to refer to a dictionary or build a database myself. Building a database myself to classify noun or verb seems stupid, so I was thinking if I could make the program to go online when it finds the words not in its database and add that word to the local database using some online dictionary?

If anyone feels uncomfortable to read the question, please post your doubts,
if anyone feels there is a better way of doing this, help me with your ideas,
if anyone knows how to do it, please do guide me..
thanks to all


Regards
Shri..

SCJP 5.0
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Seems reasonable, although if the dictionary in question doesn't have an API it'll be a lot of work. You should probably check for existing word classification work since NLP isn't a new field.

Be aware that classification depends on context, and NLP in general is a non-truvial problem.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36468
    
  16
Not a "beginning" question. Moving.

As David Newton says, natural language processing is a major problem; it is really a science in its own right.
Shrinath M Aithal
Ranch Hand

Joined: May 20, 2009
Posts: 82
ok, thank you guys..
But may I know how do you read from a online page on the web and extract only the information you want?? Like lookup a word in online thesauraus and say if it is verb or noun or what part of speech it is?
Because I googled a bit, and couldn't find many java source codes that could do what I wanted.. Any help would be enlightning and appreciated..
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Without an API you'd have to screen-scrape.

As I said--I'd seriously consider looking for existing datasets, although naive, non-contextual usage may not be what you want.

I'd probably join the ACM (if you're not already a member) and start reading papers---a ton of dissertations and theses have been written on what you're trying to accomplish.
Shrinath M Aithal
Ranch Hand

Joined: May 20, 2009
Posts: 82
ok.. So what you say is I use the already existing datasets, what do you feel about Wordnet? would it be easier ?
By the way, thanks for that ACM, I wasn't aware of that.. Now there are loads of things what I wanted
Shrinath M Aithal
Ranch Hand

Joined: May 20, 2009
Posts: 82
found a good api based and command line based Parts of Speech tagger, "stanford pos tagger", thought would just let anyone know if they are looking for one.. Thank you guys
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: reading info from english dictionary
 
Similar Threads
Creating a statistics report with JSP
Java program to solve scramles
Forum on Java Glossary
Multiple Linguistic
API for English dictionary in java