aspose file tools*
The moose likes Other Open Source Projects and the fly likes Special Characters Lucene Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Special Characters Lucene" Watch "Special Characters Lucene" New topic
Author

Special Characters Lucene

Vallaru smitha
Ranch Hand

Joined: Aug 19, 2008
Posts: 87

Hi,

When I tried to do a lucene search using escape character with other
special character like the following:



public class PharserQuery {
@SuppressWarnings("deprecation")
public static void main(String[] args) throws IOException, ParseException {
StandardAnalyzer analyzer = new StandardAnalyzer();
Directory index = new RAMDirectory();

IndexWriter w = new IndexWriter(index, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED);
addDoc(w, "Lucene in Act^ion");
addDoc(w, "Lucene lucene Act:ion");
addDoc(w, "Managing Act?ion");
addDoc(w, "The Art of Computer Act-ion");
addDoc(w, "Lucene");

w.close();

// 2. query

String querystr = args.length > 0 ? args[0] : "Act-ion";
querystr = querystr.toLowerCase();
String parserQueryStr = "";
Query query;
IndexSearcher searcher = new IndexSearcher(index, true);
//if (queryStr.IndexOf())
// Hits hits;
if (querystr.indexOf("*") < 0) {
String escaped = QueryParser.escape(querystr);

QueryParser parser = new QueryParser("title", analyzer);
query = parser.parse(querystr);
//query = parser.parse(escaped);
} else {
Term term = new Term("title", querystr);
query = new WildcardQuery(term);
System.out.println("qury : " + query);
// Hits hits = indexSearcher.search(query);
}

/*
* System.out.println("Query: " + query.toString()); hits =
* searcher.search(query); System.out.println("Found " + hits.length() +
* " hits.");
*/

int hitsPerPage = 10;

TopScoreDocCollector collector = TopScoreDocCollector.create(
hitsPerPage, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("ScoreDoc[] hits: " + hits);

// 4. display results
System.out.println("Found " + hits.length + " hits.");
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get("title"));
}

// searcher can only be closed when there
// is no need to access the documents any more.

searcher.close();
}

private static void addDoc(IndexWriter w, String value) throws IOException {
Document doc = new Document();
doc
.add(new Field("title", value, Field.Store.YES,
Field.Index.ANALYZED, Field.TermVector.YES));

w.addDocument(doc);
}

}

when I search
Act-ion both with escape and without ( just parsing as it is) is displaying all the 4 hits.
Act^ion : Error wihout escape and with escape all the four. why is it not able to parse ^ as it is same as all the other special charactes.
Act?ion and Act:ion: 0 hits without escape and all the 4 with escape

Could anyone let me know how should i get the exact serch ie.if i try "Act-ion" then only 4th one should be displayed and similarly all others.

thanks
Smitha
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Special Characters Lucene