Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Order of SQL WHERE clause conditions & database table with index

 
Stanley Wong
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is a rule of thumb for database multicolumn index that says "put the most selective column first in the index definition". Usually it will give better performance.
But, did the ordering of WHERE clause conditions have performance impact to the query on a table with multicolumn index? Consider this scenario:

Table BID has columns USERID, ITEMID, DATE. Assuming USERID has highest selectivity and DATE has lowest selectivity. The following multicolumn index was created IDX_BID(USERID, ITEMID, DATE).

Does the following two SQL statements have a difference in their performance?
1. SELECT * FROM BID WHERE USERID=? AND ITEMID=? AND DATE=?
2. SELECT * FROM BID WHERE DATE=? AND ITEMID=? AND USERID=?

 
Bruno Boehr
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In theory, exactly what plans the optimizer is going to come up with should be determined by the indexes available and their stats (as well as some heuristics only the optimizer knows about). In your particlular case, since the multicolumn index contains all the data needed to resolve the conditions specified in both queries, that index will make a very good candidate to be chosen as the primary place to look at, regardless of the order in which the conditions appear in the statement.

But in practice, I have heard so many horror stories of optimizers sometimes doing their work in absolutely weird, illogical ways that I would not be sure of anything unless I check out the execution plan(s).
 
stu derby
Ranch Hand
Posts: 333
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It will depend on the optimization techniques that the database uses. I can only speak specifically about Oracle database.

With Oracle, the are 2 different optimization engines, the Cost-Based Optimizer (CBO) and the Rule-Based Optimzer (RBO). The RBO is the older and has been deprecated for maybe 6 or 8 years, it hasn't improved since Oracle 7 or 8; I think its no longer available in Oracle 10g, and if it is available, you still don't want to use it. The CBO is newer, better, and has all the recent improvements; I believe it's the default now, and tough to avoid using if you wanted to...

The CBO doesn't care at all the order of any element in the SQL statement; a large number of "likely" execution plans are evaluated, each sub-part is given a cost based on meta information about the data that the database maintains, and the execution plan with the lowest estimated cost is chosen to execute.

With the old RBO, a series of heuristic rules are evaluated in order; in some of the rules, order in the FROM clause is a documented "tie breaker" in determining the order in which the tables are joined. Some people have also claimed that order in the WHERE clause is a similar tie-breaker under certain circumstances, but I'm not aware of any documentation to support that.

However, for the simple query you're asking about, even the RBO will certainly pick the same execution plan no matter what order is put in the WHERE clause. You would probably need to be joining more than 5 tables (or have a similar number of sub-queries) before it might matter. Most modern databases are going to be as smart as Oracle is, at that level of complexity.

Of course, the easiest way to settle the question for your database is to set up a testcase and time it.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic