File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Other Open Source Projects and the fly likes lucene and office 2007 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Head First Android this week in the Android forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "lucene and office 2007" Watch "lucene and office 2007" New topic

lucene and office 2007

Seamus Minogue
Ranch Hand

Joined: Jun 24, 2008
Posts: 41
Has anyone indexed Office 2007 documents into lucene?

Im trying to index a docx document and allow full text searching of it. The indexing is working for *.doc documents but FT doesn't seem to work for *.docx documents.

If someone could point in the in right direction I would appreciate it.
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42956
Which library are you using for reading doc files - Apache POI? If so, that doesn't support the XML-based Office files yet. You could try the beta version of POI, which does support DOCX to a certain degree.
Seamus Minogue
Ranch Hand

Joined: Jun 24, 2008
Posts: 41
Ill take a look at that :-) Thanks
It is sorta covered in the JavaRanch Style Guide.
subject: lucene and office 2007
jQuery in Action, 3rd edition