All,
Thanks in advance. I am indexing documents via multiple data sources. I am creating meta
cards for each document and storing them in an Oracle DB. I only store the meta card and a link to the document, not the document itself.
I started using POI and PDFBOX to read
doc, excel, power point, etc..
If I want to create structured, intelligeble phrases and summaries from let us say a an expense report, would you recommend using LUCENE or regular expressions? I've considering creating a library class of some sort of keywords to phrases and just allowing it to grow. I know there has to be a more powerful and efficient way to do this other than regular expressions.
So back the expense report example. I want to find words that match Mr. or Mrs, Unilever, 2012 conference, etc.. and store those in the metacard.
Thanks,
AD