Has anyone indexed Office 2007 documents into lucene?
Im trying to index a docx document and allow full text searching of it. The indexing is working for *.doc documents but FT doesn't seem to work for *.docx documents.
If someone could point in the in right direction I would appreciate it.
posted 7 years ago
Which library are you using for reading doc files - Apache POI? If so, that doesn't support the XML-based Office files yet. You could try the beta version of POI, which does support DOCX to a certain degree.