• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

how is the quality of the Lucene ports

 
Axel Janssen
Ranch Hand
Posts: 2166
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

ok. I see on page 348 that the ports lag a bit behind and that there is little communication between the ports' developers. But generally you portray them as quite successful in chapter 9.

Do you know some programming specific problems of the ports?
I was in project where we used lucene. It was really very little code. They used extra-classes for PDF text extraction or some such. Are those packages on top of Lucene also available for the other languages?

regards Axel
 
Otis Gospodnetic
Author
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

PDF -> text extraction generally falls outside of the scope of each individual Lucene port. Typically you use an external, independent application of library (e.g. PDFBox - as a matter of fact, you can see that in chapter 7 - http://www.lucenebook.com/search?query=pdfbox ).

As for quality of ports, I am on various ports' mailing lists, so my impression is that CLucene, dotLucene, and PyLucene are all very active and probably of solid quality, judging by people who are behind them. Lupy is lagging more, and I know it doesn't support quote everything that original Lucene does.

Otis
 
Erik Hatcher
Author
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Axel Janssen:
Do you know some programming specific problems of the ports?
I was in project where we used lucene. It was really very little code. They used extra-classes for PDF text extraction or some such. Are those packages on top of Lucene also available for the other languages?


The most up-to-date ports available are dotLucene from Sourceforge (it claims 1.4.3 compatibility) and pylucene (which is derived directly from the Java version, and thus as up-to-date as you want it to be). The other ports lag further behind.

As for the other packages - on Windows and the .NET version of Lucene (dotLucene) there are tons of API's and commercial packages to extract PDF and Word documents, so that shouldn't be a problem. With techniques like pylucene uses, its possible that anything in Java could be GCJ'd to native code and accessed through Python and other scripting languages (and of course C/C++ code).

Good news on this front is that we're going to bring Lucene to a top-level Apache project in the near future and solicit the creators of the ports to contribute their code to this project so that we can get better control and compatibility testing between the various ports.
 
Pradeep bhatt
Ranch Hand
Posts: 8927
Firefox Browser Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What are Lucene ports ?
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Pradeep Bhat:
What are Lucene ports ?

A "port", in a software context, is about taking an existing application/project written in/for one language/platform and to convert it for another language/platform.
 
Pradeep bhatt
Ranch Hand
Posts: 8927
Firefox Browser Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Lasse
 
Siripa Siangklom
Ranch Hand
Posts: 79
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there a PHP port of the Lucene search engine?
 
Erik Hatcher
Author
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Siripa Siangklom:
Is there a PHP port of the Lucene search engine?


Not to my knowledge.... but you can call Java from PHP, right? That would be the most ideal way to integrate it.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic