Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Search Strategy

 
Horatio Westock
Ranch Hand
Posts: 221
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Folks,

SQL isn't really my thing, I know how to do the usual things, but I'm no expert. I wonder if anyone could comment/provide suggestions on this?

(Using Postgres 7.1)

There is a table in the database, Site, which has fields: address_1, address_2, address_3 and postalcode.

One of the problems is, the address fields are filled out inconsistently. So for example, the third line may be empty, but the second contain the city. You get the idea.

Now I need a search which takes address lines 1, 2, 3 and the postcode as search terms, finds matches and ranks them. For example, the user entered search terms may be:

Address 1: New Rose Hotel
Address 2: Gibson Street
Address 3: GibsonTown
Post code: WGB 123

I have a query which works, but seems really ugly. I concatenate lines 1, 2 and 3 of the search query and tokenize into individual words. I then filter out anything less than 3 characters (generally Rd. St. etc) and any really common tokens (street, road, lane, place etc). I also filter punctuation and whitespace from the post code and limit to 5 characters. This leaves me with search criteria like this:

New, Rose, Hotel, Gibson, GibsonTown and WGB12

Now I build a query. The query works by concatenating the address lines (using coalesce to turn NULL in an empty string), then doing a case insensitive regular expression match with word boundaries. I generate one such sub query per token, then UNION ALL the results.

To add a ranking system, I add a constant value of '1 point' for each token match, and '3 points' for a postcode match.

The whole lot is wrapped in another query which LEFT JOINs with the Site table to get the info I want. This outer wrapper is really just for testing, as I just need the primary keys in my application.

So, our example query would look like this:



Seems a bit complicated! Any SQL wizards want to let me in on a secret of how to do this more simply/efficiently?

Much appreciated!
[ April 05, 2005: Message edited by: Horatio Westock ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic