*
The moose likes JForum and the fly likes Submit a bugfix patch for searching (CJK supports, too) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » JForum
Bookmark "Submit a bugfix patch for searching (CJK supports, too)" Watch "Submit a bugfix patch for searching (CJK supports, too)" New topic
Author

Submit a bugfix patch for searching (CJK supports, too)

Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Hi Rafael,

I've got a patch for the search function. Download here (I can't attach file here, am I too new?):
http://s22.yousendit.com/d.aspx?id=08TS2L9NC3K8Z3KPXHDJSWIN0Kl

I've enhanced the word splitting part for both indexing and searching.
Now it can split english word (multiple lower case characters) and CJK characters properly.

For search CJK characters properly:
* Change /templates/default/search.htm 's form method from get to post.

For better performance with indexed table:
* Change generic_queries.sql
==============
SearchModel.searchByWord = SELECT post_id FROM jforum_search_wordmatch wm, jforum_search_words w \
WHERE wm.word_id = w.word_id \
AND LOWER(w.word) = LOWER(?)
==============
to
==============
SearchModel.searchByWord = SELECT post_id FROM jforum_search_wordmatch wm, jforum_search_words w \
WHERE wm.word_id = w.word_id \
AND w.word = LOWER(?)
==============

[originally posted on jforum.net by alexieong]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
hhmm.. the website is reporting invalid link.

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Sorry to hear that. I've re-uploaded again:

http://s44.yousendit.com/d.aspx?id=2D7RP4SWRWVA71DJO5IQLSU8V6
or
http://rapidshare.de/files/4373578/search_patch.zip.html
[originally posted on jforum.net by Anonymous]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424


Let me know if it works properly. Thanks
[originally posted on jforum.net by alexieong]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Thanks, I fetched it.

I'm adding your changes to the cvs right now. Also, I have refactored the entire SearchIndexerModel (not in the cvs yet), and it runs amazing faster.

Thanks for the patches.

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Well, there is a bug. Your check for



makes a word like



being interpreted as just "somethi", while the rest is discarted. Using Character.isLetterOrDigit(chars[i]) looks to work with unicode words, in the form \u7ba1\u7406, but at least here, it does not work with "???�?�?�?�???�?".

Ideas?

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Umm... my quick fix is work properly for CJK, not considering the language like russian, yet. :P

It seems the index won't update when some text is removed when a post is edited or removed. right? It might be a problem.
[originally posted on jforum.net by alexieong]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Submit a bugfix patch for searching (CJK supports, too)