aspose file tools*
The moose likes Java in General and the fly likes Create meta cards of documents using POI. Lucene or reg expressions Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Create meta cards of documents using POI. Lucene or reg expressions" Watch "Create meta cards of documents using POI. Lucene or reg expressions" New topic
Author

Create meta cards of documents using POI. Lucene or reg expressions

Aaron Williams
Greenhorn

Joined: Mar 09, 2011
Posts: 2

All,

Thanks in advance. I am indexing documents via multiple data sources. I am creating meta cards for each document and storing them in an Oracle DB. I only store the meta card and a link to the document, not the document itself.

I started using POI and PDFBOX to read doc, excel, power point, etc..

If I want to create structured, intelligeble phrases and summaries from let us say a an expense report, would you recommend using LUCENE or regular expressions? I've considering creating a library class of some sort of keywords to phrases and just allowing it to grow. I know there has to be a more powerful and efficient way to do this other than regular expressions.

So back the expense report example. I want to find words that match Mr. or Mrs, Unilever, 2012 conference, etc.. and store those in the metacard.

Thanks,
AD
Tim Moores
Rancher

Joined: Sep 21, 2011
Posts: 2408
Instead of handling all those document formats yourself, you may want to look into the Apache Tika project - it has all that built in, and runs on top of Lucene. For semantic text handling I definitely recommend Lucene.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Create meta cards of documents using POI. Lucene or reg expressions