Hi friends, I am developing a java application where i need to extract text content from web pages and then summarize it based on a keyword given by the user.I have extracted the text content from web pages but i need to summarize it based on keyword given.Is there any java tools available which can help me sort this problem or someone can send me some code which converts the text to bits of text. thanking u in advance Pradeep
I'm not aware of a text summarization API in Java. Lucene lets you index and search text, but it does not address summarization. I'm also not sure what you mean by "summarize it based on a keyword" - do you want to extract those parts of the text that deal with that particular keyword?
You need to parse the text into units that make sense to humans - phrases, sentences and paragraphs. Next score those units according to the presence of keyword(s), now select the best of the units that are "hits" according to typical writing principles and the size of the summary you are aming for.
What do I mean about writing principles? Think about how you yourself scan text. For example you expect the first sentence of a paragraph to be meaningful in terms of the content of that paragraph. You expect a good chance that the last paragraph of an article to summarize the article.
In the prehistoric era of computers (showing my age now) there was an indexing technique called KWIC - Key Word In Context. It created a listing with the n words preceeding a key word plus the n words following. This put a burden on the reader to recognize a significant context versus a trival one.
This is a topic of continued interest to me, let us know what you come up with. Bill