This week's book giveaway is in the Jobs Discussion forum.
We're giving away four copies of Java Interview Guide and have Anthony DePalma on-line!
See this thread for details.
The moose likes Other Open Source Projects and the fly likes lucene and office 2007 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Java Interview Guide this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "lucene and office 2007" Watch "lucene and office 2007" New topic

lucene and office 2007

Seamus Minogue
Ranch Hand

Joined: Jun 24, 2008
Posts: 41
Has anyone indexed Office 2007 documents into lucene?

Im trying to index a docx document and allow full text searching of it. The indexing is working for *.doc documents but FT doesn't seem to work for *.docx documents.

If someone could point in the in right direction I would appreciate it.
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
Which library are you using for reading doc files - Apache POI? If so, that doesn't support the XML-based Office files yet. You could try the beta version of POI, which does support DOCX to a certain degree.
Seamus Minogue
Ranch Hand

Joined: Jun 24, 2008
Posts: 41
Ill take a look at that :-) Thanks
I agree. Here's the link:
subject: lucene and office 2007
It's not a secret anymore!