This week's book giveaway is in the Agile and other Processes forum.
We're giving away four copies of The Mikado Method and have Ola Ellnestam and Daniel Brolund on-line!
See this thread for details.
The moose likes Java in General and the fly likes text Summarization API Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "text Summarization API" Watch "text Summarization API" New topic
Author

text Summarization API

pradeep u nair
Greenhorn

Joined: Feb 24, 2006
Posts: 25
Hi friends,
I am developing a java application where i need to extract text content from web pages and then summarize it based on a keyword given by the user.I have extracted the text content from web pages but i need to summarize it based on keyword given.Is there any java tools available which can help me sort this problem or someone can send me some code which converts the text to bits of text.
thanking u in advance
Pradeep
Jan Groth
Ranch Hand

Joined: Feb 03, 2004
Posts: 456
no easy way to achieve this, sounds like you need a search engine, which indices the text for you.

btw: if not a must, you can save the detour to extract the text from the webpage...

try lucene - lucene.apache.org


regards,
jan
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 35249
    
    7
I'm not aware of a text summarization API in Java. Lucene lets you index and search text, but it does not address summarization. I'm also not sure what you mean by "summarize it based on a keyword" - do you want to extract those parts of the text that deal with that particular keyword?


Android appsImageJ pluginsJava web charts
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12269
    
    1
You need to parse the text into units that make sense to humans - phrases, sentences and paragraphs. Next score those units according to the presence of keyword(s), now select the best of the units that are "hits" according to typical writing principles and the size of the summary you are aming for.

What do I mean about writing principles? Think about how you yourself scan text.
For example you expect the first sentence of a paragraph to be meaningful in terms of the content of that paragraph. You expect a good chance that the last paragraph of an article to summarize the article.

In the prehistoric era of computers (showing my age now) there was an indexing technique called KWIC - Key Word In Context. It created a listing with the n words preceeding a key word plus the n words following. This put a burden on the reader to recognize a significant context versus a trival one.

This is a topic of continued interest to me, let us know what you come up with.
Bill


Java Resources at www.wbrogden.com
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: text Summarization API
 
Similar Threads
how to develop web site
Why JSP why not Directly Servlet!....
Topic: selecting Tiles or SiteMesh for an Struts web application application?(advice
Configure IE to display XML?
Chinese content type