File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Other Open Source Projects and the fly likes lucene and office 2007 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "lucene and office 2007" Watch "lucene and office 2007" New topic
Author

lucene and office 2007

Seamus Minogue
Ranch Hand

Joined: Jun 24, 2008
Posts: 41
Has anyone indexed Office 2007 documents into lucene?

Im trying to index a docx document and allow full text searching of it. The indexing is working for *.doc documents but FT doesn't seem to work for *.docx documents.

If someone could point in the in right direction I would appreciate it.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41182
    
  45
Which library are you using for reading DOC files - Apache POI? If so, that doesn't support the XML-based Office files yet. You could try the beta version of POI, which does support DOCX to a certain degree.


Ping & DNS - my free Android networking tools app
Seamus Minogue
Ranch Hand

Joined: Jun 24, 2008
Posts: 41
Ill take a look at that :-) Thanks
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: lucene and office 2007
 
Similar Threads
Java API for word files
cannot open office2007 documents from the java application
Magic number for Microsoft 2007 files
Print MS Office Documents using Java API
how to check if a docx, xlsx, or pptx file is password protected using apache POI?