This week's book giveaway is in the Java 8 forum.
We're giving away four copies of Java 8 in Action and have Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft on-line!
See this thread for details.
The moose likes Other Open Source Projects and the fly likes how is the quality of the Lucene ports Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "how is the quality of the Lucene ports" Watch "how is the quality of the Lucene ports" New topic
Author

how is the quality of the Lucene ports

Axel Janssen
Ranch Hand

Joined: Jan 08, 2001
Posts: 2164
Hi,

ok. I see on page 348 that the ports lag a bit behind and that there is little communication between the ports' developers. But generally you portray them as quite successful in chapter 9.

Do you know some programming specific problems of the ports?
I was in project where we used lucene. It was really very little code. They used extra-classes for PDF text extraction or some such. Are those packages on top of Lucene also available for the other languages?

regards Axel
Otis Gospodnetic
Author
Greenhorn

Joined: Dec 30, 2004
Posts: 23
Hello,

PDF -> text extraction generally falls outside of the scope of each individual Lucene port. Typically you use an external, independent application of library (e.g. PDFBox - as a matter of fact, you can see that in chapter 7 - http://www.lucenebook.com/search?query=pdfbox ).

As for quality of ports, I am on various ports' mailing lists, so my impression is that CLucene, dotLucene, and PyLucene are all very active and probably of solid quality, judging by people who are behind them. Lupy is lagging more, and I know it doesn't support quote everything that original Lucene does.

Otis


Lucene in Action: http://www.manning.com/lucene
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Axel Janssen:
Do you know some programming specific problems of the ports?
I was in project where we used lucene. It was really very little code. They used extra-classes for PDF text extraction or some such. Are those packages on top of Lucene also available for the other languages?


The most up-to-date ports available are dotLucene from Sourceforge (it claims 1.4.3 compatibility) and pylucene (which is derived directly from the Java version, and thus as up-to-date as you want it to be). The other ports lag further behind.

As for the other packages - on Windows and the .NET version of Lucene (dotLucene) there are tons of API's and commercial packages to extract PDF and Word documents, so that shouldn't be a problem. With techniques like pylucene uses, its possible that anything in Java could be GCJ'd to native code and accessed through Python and other scripting languages (and of course C/C++ code).

Good news on this front is that we're going to bring Lucene to a top-level Apache project in the near future and solicit the creators of the ports to contribute their code to this project so that we can get better control and compatibility testing between the various ports.


Co-author of Lucene in Action
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8898

What are Lucene ports ?


Groovy
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Originally posted by Pradeep Bhat:
What are Lucene ports ?

A "port", in a software context, is about taking an existing application/project written in/for one language/platform and to convert it for another language/platform.


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8898

Thanks Lasse
Siripa Siangklom
Ranch Hand

Joined: Jan 26, 2004
Posts: 79
Is there a PHP port of the Lucene search engine?
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Siripa Siangklom:
Is there a PHP port of the Lucene search engine?


Not to my knowledge.... but you can call Java from PHP, right? That would be the most ideal way to integrate it.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: how is the quality of the Lucene ports
 
Similar Threads
Need Suggestion for Local Search Engine
Lucene in Action
Lucene
Hibernate Search In Action Apache Lucene & SOLR
Lucene In action Full text Search on DB