Chapter 7 Extracting text with Tika
What is Tika?
Tika’s logical design and API
Tika’s built-in text extraction tool
Extracting text programmatically
Indexing custom XML
What do you mean by "selectively"? If you want to pull text from, say a named DIV you'd need to pre-process the input to include only the text you want.
Joined: Nov 29, 2007
Thanks for the pointer to the Tika site. I will take a look now.
By "selectively" i did mean from a certain div... probably, like I want to extract school names and their grades from a school-ranking website.
I will take a look at the library now. This seems interesting.