• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Lucene

 
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is Lucene all about?
 
Author
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Lucene is all about text indexing and full-text searching. It's a full-text library/toolkit that you can use to add searching capabilities to your applications.

You will find a lot of Lucene resources (articles, tutorials, etc.) at
http://wiki.apache.org/jakarta-lucene/IntroductionToLucene and at http://www.java201.com/resources/browse/38-all.html . You could also grab the free chapter from Lucene in Action, chapter 1. It will explain what Lucene is and how it is used. Chapter 1 can be dowloaded from http://www.manning-source.com/books/hatcher2/hatcher2_chp1.pdf

Otis
 
Ranch Hand
Posts: 1312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What are difference between indexing and full-text searching in Database and Apache Lucence ?
 
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by somkiat puisungnoen:
What are difference between indexing and full-text searching in Database and Apache Lucence ?



If your database supports full-text searching, there may not be much difference in the results. However Lucene is extremely extensible in the analysis of text, such that you can control how words get tokenized, stemmed, filtered, and so on. I have used the full-text indexing capabilities of SQL Server (indexing BLOBs of Word and PDF documents) with success. If your information is already in a database it is well worth considering the built-in capabilities of your database and whether locking in to that vendor is pragmatic for your project.
 
Ranch Hand
Posts: 1934
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
so, can Lucene be used for content management?
 
Erik Hatcher
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Kishore Dandu:
so, can Lucene be used for content management?



Lucene is a general-purpose search engine API. If you have text, Lucene will work on it.

More to the point, Lucene makes a great piece to a CMS. Jakarta Slide, for example, has extensive Lucene search capability as part of its DASL implementation. I'd venture to say that almost all Java-based CMSs have Lucene integration.
 
Ranch Hand
Posts: 995
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I must say from the beginning that I've never read more than blogs on Lucene. What seems to me interesting is how Lucene manage to index/search text on specific file formats (pdf, docs, etc)? Does is provide different extensions for different formats?

--
./pope

ps: Erik pls excuse my ignorance.
 
Erik Hatcher
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ali Pope:
I must say from the beginning that I've never read more than blogs on Lucene. What seems to me interesting is how Lucene manage to index/search text on specific file formats (pdf, docs, etc)? Does is provide different extensions for different formats?

--
./pope

ps: Erik pls excuse my ignorance.



Quite a fair question. The simple answer is that Lucene does not, itself, deal with files of any format at all. It deals with text handed to it either as a String or a java.io.Reader. It is entirely up to the developer to integrate in PDF, Word, XML, and other format parsing. Thankfully there are a numerous open source API's available to do this. Otis did a great write-up in Chapter 7 on how to deal with common file types.
 
clojure forum advocate
Posts: 3479
Mac Objective C Clojure
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Eric
would you please spend some time and checking this ??
https://coderanch.com/t/62193/open-source/Lucene-article-JRJ
thanks sir.
 
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How is using Lucene different from using Regular Expressions ?
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Tejas Bavishi:
How is using Lucene different from using Regular Expressions ?



Right. I am also confused. isn't java's RE enough?

Also, I would like to know if Lucene in Action the only book for Lucene?>
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maybe a stupid question but I want to know if Eclipse supports Lucene?
 
Alexandru Popescu
Ranch Hand
Posts: 995
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Erik Hatcher:


Quite a fair question. The simple answer is that Lucene does not, itself, deal with files of any format at all. It deals with text handed to it either as a String or a java.io.Reader. It is entirely up to the developer to integrate in PDF, Word, XML, and other format parsing. Thankfully there are a numerous open source API's available to do this. Otis did a great write-up in Chapter 7 on how to deal with common file types.



So developing with Lucene would be something like: develop your file format reader, feed in Lucene, Lucene will give you back a good index?

--
./pope
 
Alexandru Popescu
Ranch Hand
Posts: 995
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Pradeep, afaik Eclipse has a plugin based on Lucene. I actually do not know for what it is used (very interesting - the search of Eclipse is based on Lucene?). Unfortunately, I cannot see what you mean by "Eclipse supports Lucene"?

--
./pope
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ali,

Unfortunately, I cannot see what you mean by "Eclipse supports Lucene



I meant the plugin.
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are organizations using Lucene? How popular is it?
 
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Quite popular.
 
Ranch Hand
Posts: 1209
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pradeep Bhat:


Right. I am also confused. isn't java's RE enough?


Yup! i also felt the same way. But then Lucene builds indexes and stores it for future reference. So the search has to be a lot faster once the index is built.


Also, I would like to know if Lucene in Action the only book for Lucene?>

Yeah looks like that..definitely its the only book solely dedicated to Lucene. I think Struts book by Rob harrop has something on Lucene.

 
Erik Hatcher
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by John Todd:
Hi Eric
would you please spend some time and checking this ??
https://coderanch.com/t/62193/open-source/Lucene-article-JRJ
thanks sir.



The answer was already provided in that thread - the String you pass to IndexWriter is a path on the filesystem where you want Lucene to build the index.
 
Erik Hatcher
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pradeep Bhat:


Right. I am also confused. isn't java's RE enough?

Also, I would like to know if Lucene in Action the only book for Lucene?>



Suppose you have 200,000 XML files. Is regular expressions enough to give you searching for the phrase "quick brown fox" where each of those words needs to be close positionally to match (maybe one or two words in between)? And give you the results back in milliseconds? Oh, and when you're looking for "quick", please also find documents with "fast brown fox" too. That's the kind of thing Lucene does.... it builds an inverted index of the words of the document. Vastly different than what grepping with regular expressions could do.

And yes, Lucene in Action is the only book dedicated to Lucene currently. There are several other books that mention it and even provide some basic examples, but nothing as thorough as our book currently.
 
Erik Hatcher
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pradeep Bhat:
Ali,



I meant the plugin.



What kind of plugin would you want for Lucene?? The search within Eclipse itself uses Lucene from what I've heard.

If you want to inspect a Lucene index with a GUI, check out
Luke which I often launch from Eclipse.
 
Erik Hatcher
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ali Pope:


So developing with Lucene would be something like: develop your file format reader, feed in Lucene, Lucene will give you back a good index?



Bingo!!!
 
Alexandru Popescu
Ranch Hand
Posts: 995
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Erik. Quite a smooth and quick intro to Lucene (a couple of question and your answers, and here i am ).

--
./pope
 
Otis Gospodnetic
Author
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Let me just add a bit to the answer about how a search based on regular expressions compares to what Lucene does. Think about a large Web-wide index, like the one you search with Google, AlltheWeb, Teoma, WiseNut, or Yahoo. Imagine trying to search that using just regular expressions. Pretty funny to imagine.
Actually, I did explain this in the book, and the first result for the following query gives you some info: http://www.lucenebook.com/search?query=sequentially (the first hit is from a free, sample chapter, so you can get the whole thing and read it).

Otis
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Does Google use Lucene?
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is Lucene faster than other search techniques? If yes, how ?Thanks
 
Arjun Shastry
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pradeep Bhat:
Does Google use Lucene?


Google does not use Lucene.
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Arjun Shastry:

Google does not use Lucene.



So does it use its own solution/
 
Alexandru Popescu
Ranch Hand
Posts: 995
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
afaik there is no theoretical and/or practical connection between regular expression and indexing. Moreover, my experience taught me that using r.e. on big files/big searches is a killer for an application (i remember that just switching the r.e. provider in one app. just improved the performance by 5 times).

so i guess, as always, we can say that every solution fits its own types of problems :-).

--
./pope
 
Erik Hatcher
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pradeep Bhat:
Is Lucene faster than other search techniques? If yes, how ?Thanks



Lucene is FAST!

What other techniques do you want it compared to? Lucene uses an inverted index, and uses algorithms, storage, and data structures designed by a search engine expert. Doug Cutting was instrumental in building the Excite search engine in hits hey-day, and worked for Apple building the VTwin engine, and has published numerous papers and is named on several patents related to indexing and searching techniques. Check 'em out to know more on the "how"
 
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, Google has implemented their own, highly specialized search engine.
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Lasse Koskela:
Yes, Google has implemented their own, highly specialized search engine.



Thanks Lasse. How does it compare with Lucene ?
 
Alexandru Popescu
Ranch Hand
Posts: 995
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am not sure a comparison has a sense here. As Erik said on other thread Lucene is an engine, while google is a search solution.

--
./pope
 
Pradeep bhatt
Ranch Hand
Posts: 8945
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Arjun Shastry:
Quite popular.



How many here are using Lucene ? Could you please share your experience.
 
Arjun Shastry
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pradeep Bhat:

How many here are using Lucene ? Could you please share your experience.


I havn't used Lucene but interested in future.
 
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pradeep Bhat:

Thanks Lasse. How does it compare with Lucene ?


Which comparison do you want?Softwares may be compared in terms of Space,Time and Cost.As you know its Open Source with GPL License hence its free.Among space and time,which comparison you are interested in?
 
Lasse Koskela
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pradeep Bhat:
How does [Google] compare with Lucene ?


Lucene is more generic and built for a whole community's use while Google's search engine is specialized for indexing web pages, ranking them based on various criteria, and distributing the whole thing across a huge farm of thousands of cheap boxes. Google's search engine is not open source and I'm not working for them so I can't really compare the two even if I had look inside Lucene.
 
Erik Hatcher
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Manmohan Singh:

Which comparison do you want?Softwares may be compared in terms of Space,Time and Cost.As you know its Open Source with GPL License hence its free.Among space and time,which comparison you are interested in?



Correction - Lucene is licensed using the Apache Software License, not GPL. Big difference for many!
 
Alexandru Popescu
Ranch Hand
Posts: 995
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yep, indeed big difference in many cases.

--
./pope
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic