Originally posted by Axel Janssen:
Do you know some programming specific problems of the ports?
I was in project where we used lucene. It was really very little code. They used extra-classes for PDF text extraction or some such. Are those packages on top of Lucene also available for the other languages?
The most up-to-date ports available are dotLucene from Sourceforge (it claims 1.4.3 compatibility) and pylucene (which is derived directly from the
Java version, and thus as up-to-date as you want it to be). The other ports lag further behind.
As for the other packages - on Windows and the .NET version of Lucene (dotLucene) there are tons of API's and commercial packages to extract PDF and
Word documents, so that shouldn't be a problem. With techniques like pylucene uses, its possible that anything in Java could be GCJ'd to native code and accessed through Python and other scripting languages (and of course C/C++ code).
Good news on this front is that we're going to bring Lucene to a top-level Apache project in the near future and solicit the creators of the ports to contribute their code to this project so that we can get better control and compatibility
testing between the various ports.