i need a experts to suggest me with the solutions to this problem.
i have few text files with nouns,pronounce,verbs etc...
i need to read(around 8 files) into the memory only once .
The process should be able to tag multiple sentences from the input text simultaneously.
The output should return the tagged text in the format word/tag in the order of the original text.
example: if i have sentence in a file:
My aunt’s can opener can open a drum
i should be able to read the file and convert the sentences or paragraph from the file to the output file in the following format
the output should be :
My/PRP$ aunt/NN ’s/POS can/NN opener/NN can/MD open/VB a/DT drum/NN
You have got yourself a very difficult task there. Determining whether the can in “can opener” is a noun or a verb is just as difficult a task as determining whether can in “can open” is a noun or a verb. In fact I think the answer is neither in the latter case; it is an auxiliary verb and its present meaning cannot exist without open following. In fact you should regard both “can opener” and “can open” as phrases. One is a noun comprising two words and the other a verb comprising two words, what the English language people call a phrasal verb. I would suggest you can do several things:-
Find some details about natural language processing, but that is cutting edge research which has taken sixty‑plus years even to get to the level of Google Translate.
Simplify your vocabulary and your grammar. Avoid words with two meanings, e.g. bear, can. Restrict yourself to one‑word terms. Restrict yourself to sentences in the form subject→verb→object.
Find a book like Mason Brown and Levine Lex and Yacc (O'Reilly) which shows an example using lex to print verb noun etc, but that restricts itself to my simplified vocabulary.
posted 3 years ago
vin Hari wrote:. . . My aunt’s can opener can open a drum . . .
Had you written
My aunt’s can opener can open a can that would have been easier because you can create a rule that verbs cannot follow “a”.
posted 3 years ago
Thank you Campbell for your reply, ranchers any other suggestion, how abt reading the file to database and screen the entire line and divide them on space and attach the tag.
please let me know if there are any suggestions thanks.
Natural Language Processing is a whole research area in and of itself. I propose that trying to write one yourself from scratch would be a frustrating and ultimately fruitless exercise. You might make more progress if you chose to use an existing library to process the text, such as with the Apache openNLP project.