This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
ok. I see on page 348 that the ports lag a bit behind and that there is little communication between the ports' developers. But generally you portray them as quite successful in chapter 9.
Do you know some programming specific problems of the ports? I was in project where we used lucene. It was really very little code. They used extra-classes for PDF text extraction or some such. Are those packages on top of Lucene also available for the other languages?
PDF -> text extraction generally falls outside of the scope of each individual Lucene port. Typically you use an external, independent application of library (e.g. PDFBox - as a matter of fact, you can see that in chapter 7 - http://www.lucenebook.com/search?query=pdfbox ).
As for quality of ports, I am on various ports' mailing lists, so my impression is that CLucene, dotLucene, and PyLucene are all very active and probably of solid quality, judging by people who are behind them. Lupy is lagging more, and I know it doesn't support quote everything that original Lucene does.
Originally posted by Axel Janssen: Do you know some programming specific problems of the ports? I was in project where we used lucene. It was really very little code. They used extra-classes for PDF text extraction or some such. Are those packages on top of Lucene also available for the other languages?
The most up-to-date ports available are dotLucene from Sourceforge (it claims 1.4.3 compatibility) and pylucene (which is derived directly from the Java version, and thus as up-to-date as you want it to be). The other ports lag further behind.
As for the other packages - on Windows and the .NET version of Lucene (dotLucene) there are tons of API's and commercial packages to extract PDF and Word documents, so that shouldn't be a problem. With techniques like pylucene uses, its possible that anything in Java could be GCJ'd to native code and accessed through Python and other scripting languages (and of course C/C++ code).
Good news on this front is that we're going to bring Lucene to a top-level Apache project in the near future and solicit the creators of the ports to contribute their code to this project so that we can get better control and compatibility testing between the various ports.