• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Extracting sentences from a text file

 
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I need to write a program that will extract sentences from a text file.If I use '.' as a delimiter and separate the text by it then each acronyme becomes a sentence!!How to solve this problem?
 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ayan Biswas wrote:I need to write a program that will extract sentences from a text file.If I use '.' as a delimiter and separate the text by it then each acronyme becomes a sentence!!How to solve this problem?




One option is to further qualify your definition of what is a sentence. For example, if a sentence must be longer than one word, or longer than two letters, wouldn't that take care of your false positives from acronyms?

Henry
 
Ayan Biswas
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

One option is to further qualify your definition of what is a sentence. For example, if a sentence must be longer than one word, or longer than two letters, wouldn't that take care of your false positives from acronyms?


here is the problem if i follow the instructions.
suppose the sentence is like this "<some text> U.S.A<some text>".Problem will persist in that case
 
Ayan Biswas
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
some text "U" ,will be the first sentence."S" will be the next sentence(which I can append to "U" as word count =1) and "A" some text will be the last sentence.so,problem persists in the last sentence.
 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your definition of sentence end is not correct. A sentence doesn't necessarily end in a dot (or question mark, exclamation mark, etc). You could regard the end of a sentence a dot, question mark or exclamation mark but only if it is followed by whitespace (space, enter, tab, etc) or nothing at all (end of String). This is the approach that Javadoc also uses.

That's still flawed however, as the sentence would end with U.S.A. even if there's something after it. Javadoc also has this problem; I've seen several Javadoc comments in the summary list end with "i.e.". We need to redefine what a sentence end is. You can expand the previous definition to include that the next word should start with an uppercase letter. However, that will still be incorrect if you have a name or something other with an uppercase letter after an acronym. It becomes evident that full sentence recognition is still not trivial (or even possible?) to do from code.
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is why my response was to further qualify your definition of what is a sentence -- and the rest of the response was just examples.

Only the OP knows the exact definition of what is a sentence, and hence, able to correctly qualify it. Now, of course, if the definition is as used in any generic text, then it is very difficult, if not impossible.

Henry
 
Ayan Biswas
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for all the replies.
 
reply
    Bookmark Topic Watch Topic
  • New Topic