*
The moose likes Other Open Source Projects and the fly likes Lucene Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Lucene " Watch "Lucene " New topic
Author

Lucene

Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

What is Lucene all about?


Groovy
Otis Gospodnetic
Author
Greenhorn

Joined: Dec 30, 2004
Posts: 23
Hi,

Lucene is all about text indexing and full-text searching. It's a full-text library/toolkit that you can use to add searching capabilities to your applications.

You will find a lot of Lucene resources (articles, tutorials, etc.) at
http://wiki.apache.org/jakarta-lucene/IntroductionToLucene and at http://www.java201.com/resources/browse/38-all.html . You could also grab the free chapter from Lucene in Action, chapter 1. It will explain what Lucene is and how it is used. Chapter 1 can be dowloaded from http://www.manning-source.com/books/hatcher2/hatcher2_chp1.pdf

Otis


Lucene in Action: http://www.manning.com/lucene
somkiat puisungnoen
Ranch Hand

Joined: Jul 04, 2003
Posts: 1312
What are difference between indexing and full-text searching in Database and Apache Lucence ?


SCJA,SCJP,SCWCD,SCBCD,SCEA I
Java Developer, Thailand
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by somkiat puisungnoen:
What are difference between indexing and full-text searching in Database and Apache Lucence ?


If your database supports full-text searching, there may not be much difference in the results. However Lucene is extremely extensible in the analysis of text, such that you can control how words get tokenized, stemmed, filtered, and so on. I have used the full-text indexing capabilities of SQL Server (indexing BLOBs of Word and PDF documents) with success. If your information is already in a database it is well worth considering the built-in capabilities of your database and whether locking in to that vendor is pragmatic for your project.


Co-author of Lucene in Action
Kishore Dandu
Ranch Hand

Joined: Jul 10, 2001
Posts: 1934
so, can Lucene be used for content management?


Kishore
SCJP, blog
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Kishore Dandu:
so, can Lucene be used for content management?


Lucene is a general-purpose search engine API. If you have text, Lucene will work on it.

More to the point, Lucene makes a great piece to a CMS. Jakarta Slide, for example, has extensive Lucene search capability as part of its DASL implementation. I'd venture to say that almost all Java-based CMSs have Lucene integration.
Alexandru Popescu
Ranch Hand

Joined: Jul 12, 2004
Posts: 995
I must say from the beginning that I've never read more than blogs on Lucene. What seems to me interesting is how Lucene manage to index/search text on specific file formats (pdf, docs, etc)? Does is provide different extensions for different formats?

--
./pope

ps: Erik pls excuse my ignorance.


blog - InfoQ.com
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Ali Pope:
I must say from the beginning that I've never read more than blogs on Lucene. What seems to me interesting is how Lucene manage to index/search text on specific file formats (pdf, docs, etc)? Does is provide different extensions for different formats?

--
./pope

ps: Erik pls excuse my ignorance.


Quite a fair question. The simple answer is that Lucene does not, itself, deal with files of any format at all. It deals with text handed to it either as a String or a java.io.Reader. It is entirely up to the developer to integrate in PDF, Word, XML, and other format parsing. Thankfully there are a numerous open source API's available to do this. Otis did a great write-up in Chapter 7 on how to deal with common file types.
Hussein Baghdadi
clojure forum advocate
Bartender

Joined: Nov 08, 2003
Posts: 3476

Hi Eric
would you please spend some time and checking this ??
http://www.coderanch.com/t/62193/open-source/Lucene-article-JRJ
thanks sir.
Tejas Bavishi
Ranch Hand

Joined: Jul 28, 2003
Posts: 73
How is using Lucene different from using Regular Expressions ?
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Originally posted by Tejas Bavishi:
How is using Lucene different from using Regular Expressions ?


Right. I am also confused. isn't java's RE enough?

Also, I would like to know if Lucene in Action the only book for Lucene?>
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Maybe a stupid question but I want to know if Eclipse supports Lucene?
Alexandru Popescu
Ranch Hand

Joined: Jul 12, 2004
Posts: 995
Originally posted by Erik Hatcher:


Quite a fair question. The simple answer is that Lucene does not, itself, deal with files of any format at all. It deals with text handed to it either as a String or a java.io.Reader. It is entirely up to the developer to integrate in PDF, Word, XML, and other format parsing. Thankfully there are a numerous open source API's available to do this. Otis did a great write-up in Chapter 7 on how to deal with common file types.


So developing with Lucene would be something like: develop your file format reader, feed in Lucene, Lucene will give you back a good index?

--
./pope
Alexandru Popescu
Ranch Hand

Joined: Jul 12, 2004
Posts: 995
Pradeep, afaik Eclipse has a plugin based on Lucene. I actually do not know for what it is used (very interesting - the search of Eclipse is based on Lucene?). Unfortunately, I cannot see what you mean by "Eclipse supports Lucene"?

--
./pope
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Ali,

Unfortunately, I cannot see what you mean by "Eclipse supports Lucene


I meant the plugin.
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Are organizations using Lucene? How popular is it?
Arjun Shastry
Ranch Hand

Joined: Mar 13, 2003
Posts: 1874
Quite popular.


MH
Karthik Guru
Ranch Hand

Joined: Mar 06, 2001
Posts: 1209
Originally posted by Pradeep Bhat:


Right. I am also confused. isn't java's RE enough?


Yup! i also felt the same way. But then Lucene builds indexes and stores it for future reference. So the search has to be a lot faster once the index is built.


Also, I would like to know if Lucene in Action the only book for Lucene?>

Yeah looks like that..definitely its the only book solely dedicated to Lucene. I think Struts book by Rob harrop has something on Lucene.
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by John Todd:
Hi Eric
would you please spend some time and checking this ??
http://www.coderanch.com/t/62193/open-source/Lucene-article-JRJ
thanks sir.


The answer was already provided in that thread - the String you pass to IndexWriter is a path on the filesystem where you want Lucene to build the index.
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Pradeep Bhat:


Right. I am also confused. isn't java's RE enough?

Also, I would like to know if Lucene in Action the only book for Lucene?>


Suppose you have 200,000 XML files. Is regular expressions enough to give you searching for the phrase "quick brown fox" where each of those words needs to be close positionally to match (maybe one or two words in between)? And give you the results back in milliseconds? Oh, and when you're looking for "quick", please also find documents with "fast brown fox" too. That's the kind of thing Lucene does.... it builds an inverted index of the words of the document. Vastly different than what grepping with regular expressions could do.

And yes, Lucene in Action is the only book dedicated to Lucene currently. There are several other books that mention it and even provide some basic examples, but nothing as thorough as our book currently.
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Pradeep Bhat:
Ali,



I meant the plugin.


What kind of plugin would you want for Lucene?? The search within Eclipse itself uses Lucene from what I've heard.

If you want to inspect a Lucene index with a GUI, check out
Luke which I often launch from Eclipse.
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Ali Pope:


So developing with Lucene would be something like: develop your file format reader, feed in Lucene, Lucene will give you back a good index?


Bingo!!!
Alexandru Popescu
Ranch Hand

Joined: Jul 12, 2004
Posts: 995
Thanks Erik. Quite a smooth and quick intro to Lucene (a couple of question and your answers, and here i am ).

--
./pope
Otis Gospodnetic
Author
Greenhorn

Joined: Dec 30, 2004
Posts: 23
Let me just add a bit to the answer about how a search based on regular expressions compares to what Lucene does. Think about a large Web-wide index, like the one you search with Google, AlltheWeb, Teoma, WiseNut, or Yahoo. Imagine trying to search that using just regular expressions. Pretty funny to imagine.
Actually, I did explain this in the book, and the first result for the following query gives you some info: http://www.lucenebook.com/search?query=sequentially (the first hit is from a free, sample chapter, so you can get the whole thing and read it).

Otis
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Does Google use Lucene?
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Is Lucene faster than other search techniques? If yes, how ?Thanks
Arjun Shastry
Ranch Hand

Joined: Mar 13, 2003
Posts: 1874
Originally posted by Pradeep Bhat:
Does Google use Lucene?

Google does not use Lucene.
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Originally posted by Arjun Shastry:

Google does not use Lucene.


So does it use its own solution/
Alexandru Popescu
Ranch Hand

Joined: Jul 12, 2004
Posts: 995
afaik there is no theoretical and/or practical connection between regular expression and indexing. Moreover, my experience taught me that using r.e. on big files/big searches is a killer for an application (i remember that just switching the r.e. provider in one app. just improved the performance by 5 times).

so i guess, as always, we can say that every solution fits its own types of problems :-).

--
./pope
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Pradeep Bhat:
Is Lucene faster than other search techniques? If yes, how ?Thanks


Lucene is FAST!

What other techniques do you want it compared to? Lucene uses an inverted index, and uses algorithms, storage, and data structures designed by a search engine expert. Doug Cutting was instrumental in building the Excite search engine in hits hey-day, and worked for Apple building the VTwin engine, and has published numerous papers and is named on several patents related to indexing and searching techniques. Check 'em out to know more on the "how"
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Yes, Google has implemented their own, highly specialized search engine.


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Originally posted by Lasse Koskela:
Yes, Google has implemented their own, highly specialized search engine.


Thanks Lasse. How does it compare with Lucene ?
Alexandru Popescu
Ranch Hand

Joined: Jul 12, 2004
Posts: 995
I am not sure a comparison has a sense here. As Erik said on other thread Lucene is an engine, while google is a search solution.

--
./pope
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8904

Originally posted by Arjun Shastry:
Quite popular.


How many here are using Lucene ? Could you please share your experience.
Arjun Shastry
Ranch Hand

Joined: Mar 13, 2003
Posts: 1874
Originally posted by Pradeep Bhat:

How many here are using Lucene ? Could you please share your experience.

I havn't used Lucene but interested in future.
Manmohan Singhania
Ranch Hand

Joined: Feb 19, 2004
Posts: 55
Originally posted by Pradeep Bhat:

Thanks Lasse. How does it compare with Lucene ?

Which comparison do you want?Softwares may be compared in terms of Space,Time and Cost.As you know its Open Source with GPL License hence its free.Among space and time,which comparison you are interested in?


Jayalalitha is my girl friend.<br />KarunaNidhi is my boy friend
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Originally posted by Pradeep Bhat:
How does [Google] compare with Lucene ?

Lucene is more generic and built for a whole community's use while Google's search engine is specialized for indexing web pages, ranking them based on various criteria, and distributing the whole thing across a huge farm of thousands of cheap boxes. Google's search engine is not open source and I'm not working for them so I can't really compare the two even if I had look inside Lucene.
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Manmohan Singh:

Which comparison do you want?Softwares may be compared in terms of Space,Time and Cost.As you know its Open Source with GPL License hence its free.Among space and time,which comparison you are interested in?


Correction - Lucene is licensed using the Apache Software License, not GPL. Big difference for many!
Alexandru Popescu
Ranch Hand

Joined: Jul 12, 2004
Posts: 995
Yep, indeed big difference in many cases.

--
./pope
 
Consider Paul's rocket mass heater.
 
subject: Lucene
 
Similar Threads
Lucene in Action
Competitors to Lucene
jsp & google search "something site:www.myhomepage.com"
how is the quality of the Lucene ports
Indexing Software