File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Other Open Source Projects and the fly likes Special Characters Lucene Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Customer Requirements for Developers this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Special Characters Lucene" Watch "Special Characters Lucene" New topic

Special Characters Lucene

Vallaru smitha
Ranch Hand

Joined: Aug 19, 2008
Posts: 87


When I tried to do a lucene search using escape character with other
special character like the following:

public class PharserQuery {
public static void main(String[] args) throws IOException, ParseException {
StandardAnalyzer analyzer = new StandardAnalyzer();
Directory index = new RAMDirectory();

IndexWriter w = new IndexWriter(index, analyzer, true,
addDoc(w, "Lucene in Act^ion");
addDoc(w, "Lucene lucene Act:ion");
addDoc(w, "Managing Act?ion");
addDoc(w, "The Art of Computer Act-ion");
addDoc(w, "Lucene");


// 2. query

String querystr = args.length > 0 ? args[0] : "Act-ion";
querystr = querystr.toLowerCase();
String parserQueryStr = "";
Query query;
IndexSearcher searcher = new IndexSearcher(index, true);
//if (queryStr.IndexOf())
// Hits hits;
if (querystr.indexOf("*") < 0) {
String escaped = QueryParser.escape(querystr);

QueryParser parser = new QueryParser("title", analyzer);
query = parser.parse(querystr);
//query = parser.parse(escaped);
} else {
Term term = new Term("title", querystr);
query = new WildcardQuery(term);
System.out.println("qury : " + query);
// Hits hits =;

* System.out.println("Query: " + query.toString()); hits =
*; System.out.println("Found " + hits.length() +
* " hits.");

int hitsPerPage = 10;

TopScoreDocCollector collector = TopScoreDocCollector.create(
hitsPerPage, true);, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("ScoreDoc[] hits: " + hits);

// 4. display results
System.out.println("Found " + hits.length + " hits.");
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get("title"));

// searcher can only be closed when there
// is no need to access the documents any more.


private static void addDoc(IndexWriter w, String value) throws IOException {
Document doc = new Document();
.add(new Field("title", value, Field.Store.YES,
Field.Index.ANALYZED, Field.TermVector.YES));



when I search
Act-ion both with escape and without ( just parsing as it is) is displaying all the 4 hits.
Act^ion : Error wihout escape and with escape all the four. why is it not able to parse ^ as it is same as all the other special charactes.
Act?ion and Act:ion: 0 hits without escape and all the 4 with escape

Could anyone let me know how should i get the exact serch ie.if i try "Act-ion" then only 4th one should be displayed and similarly all others.

It is sorta covered in the JavaRanch Style Guide.
subject: Special Characters Lucene
jQuery in Action, 3rd edition