This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Other Open Source Projects and the fly likes lucene and office 2007 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "lucene and office 2007" Watch "lucene and office 2007" New topic
Author

lucene and office 2007

Seamus Minogue
Ranch Hand

Joined: Jun 24, 2008
Posts: 41
Has anyone indexed Office 2007 documents into lucene?

Im trying to index a docx document and allow full text searching of it. The indexing is working for *.doc documents but FT doesn't seem to work for *.docx documents.

If someone could point in the in right direction I would appreciate it.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41089
    
  44
Which library are you using for reading DOC files - Apache POI? If so, that doesn't support the XML-based Office files yet. You could try the beta version of POI, which does support DOCX to a certain degree.


Ping & DNS - my free Android networking tools app
Seamus Minogue
Ranch Hand

Joined: Jun 24, 2008
Posts: 41
Ill take a look at that :-) Thanks
 
Consider Paul's rocket mass heater.
 
subject: lucene and office 2007
 
Similar Threads
cannot open office2007 documents from the java application
Magic number for Microsoft 2007 files
Java API for word files
Print MS Office Documents using Java API
how to check if a docx, xlsx, or pptx file is password protected using apache POI?