This week's book giveaway is in the Agile forum.
We're giving away four copies of The Software Craftsman and have Sandro Mancuso on-line!
See this thread for details.
The moose likes JDBC and Relational Databases and the fly likes Search Strategy Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of The Software Craftsman this week in the Agile forum!
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "Search Strategy" Watch "Search Strategy" New topic

Search Strategy

Horatio Westock
Ranch Hand

Joined: Feb 23, 2005
Posts: 221
Hi Folks,

SQL isn't really my thing, I know how to do the usual things, but I'm no expert. I wonder if anyone could comment/provide suggestions on this?

(Using Postgres 7.1)

There is a table in the database, Site, which has fields: address_1, address_2, address_3 and postalcode.

One of the problems is, the address fields are filled out inconsistently. So for example, the third line may be empty, but the second contain the city. You get the idea.

Now I need a search which takes address lines 1, 2, 3 and the postcode as search terms, finds matches and ranks them. For example, the user entered search terms may be:

Address 1: New Rose Hotel
Address 2: Gibson Street
Address 3: GibsonTown
Post code: WGB 123

I have a query which works, but seems really ugly. I concatenate lines 1, 2 and 3 of the search query and tokenize into individual words. I then filter out anything less than 3 characters (generally Rd. St. etc) and any really common tokens (street, road, lane, place etc). I also filter punctuation and whitespace from the post code and limit to 5 characters. This leaves me with search criteria like this:

New, Rose, Hotel, Gibson, GibsonTown and WGB12

Now I build a query. The query works by concatenating the address lines (using coalesce to turn NULL in an empty string), then doing a case insensitive regular expression match with word boundaries. I generate one such sub query per token, then UNION ALL the results.

To add a ranking system, I add a constant value of '1 point' for each token match, and '3 points' for a postcode match.

The whole lot is wrapped in another query which LEFT JOINs with the Site table to get the info I want. This outer wrapper is really just for testing, as I just need the primary keys in my application.

So, our example query would look like this:

Seems a bit complicated! Any SQL wizards want to let me in on a secret of how to do this more simply/efficiently?

Much appreciated!
[ April 05, 2005: Message edited by: Horatio Westock ]
Have you tried LearnNowOnline?
subject: Search Strategy