Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Submit a bugfix patch for searching (CJK supports, too)

 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Rafael,

I've got a patch for the search function. Download here (I can't attach file here, am I too new?):
http://s22.yousendit.com/d.aspx?id=08TS2L9NC3K8Z3KPXHDJSWIN0Kl

I've enhanced the word splitting part for both indexing and searching.
Now it can split english word (multiple lower case characters) and CJK characters properly.

For search CJK characters properly:
* Change /templates/default/search.htm 's form method from get to post.

For better performance with indexed table:
* Change generic_queries.sql
==============
SearchModel.searchByWord = SELECT post_id FROM jforum_search_wordmatch wm, jforum_search_words w \
WHERE wm.word_id = w.word_id \
AND LOWER(w.word) = LOWER(?)
==============
to
==============
SearchModel.searchByWord = SELECT post_id FROM jforum_search_wordmatch wm, jforum_search_words w \
WHERE wm.word_id = w.word_id \
AND w.word = LOWER(?)
==============

[originally posted on jforum.net by alexieong]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hhmm.. the website is reporting invalid link.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry to hear that. I've re-uploaded again:

http://s44.yousendit.com/d.aspx?id=2D7RP4SWRWVA71DJO5IQLSU8V6
or
http://rapidshare.de/files/4373578/search_patch.zip.html
[originally posted on jforum.net by Anonymous]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Let me know if it works properly. Thanks
[originally posted on jforum.net by alexieong]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks, I fetched it.

I'm adding your changes to the cvs right now. Also, I have refactored the entire SearchIndexerModel (not in the cvs yet), and it runs amazing faster.

Thanks for the patches.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, there is a bug. Your check for



makes a word like



being interpreted as just "somethi", while the rest is discarted. Using Character.isLetterOrDigit(chars[i]) looks to work with unicode words, in the form \u7ba1\u7406, but at least here, it does not work with "???�?�?�?�???�?".

Ideas?

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Umm... my quick fix is work properly for CJK, not considering the language like russian, yet. :P

It seems the index won't update when some text is removed when a post is edited or removed. right? It might be a problem.
[originally posted on jforum.net by alexieong]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic