File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JDBC and the fly likes SQL question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "SQL question" Watch "SQL question" New topic
Author

SQL question

Joseph Sweet
Ranch Hand

Joined: Jan 29, 2005
Posts: 327
Hi, i am trying to understand something about correlated sql sub-queries. the following is taked from some web tutorial. regarding the third example, they say that



since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.



thats looks incorrect to me... the third example looks very correlated subquery to me, so it should perform slowlier than the first.

Is there something i do not underst here???


Example 1 - non-correlated sub query

SELECT dept.name FROM dept
WHERE dept.id NOT IN
( SELECT dept_id
FROM emp
WHERE dept_id IS NOT NULL)

Example 2- outer join

SELECT dept.name FROM dept,emp
WHERE emp.dept_id (+) = dept.id

Example 3 - correlated sub query

SELECT dept.name FROM dept
WHERE NOT EXISTS (SELECT dept_id
FROM emp
WHERE emp.dept_id = dept.id)

The second example is an outer join SQL statement. This in fact does more than just return the names of departments which have no employees assigned to them, it also returns the names of those departments that do have employees assigned to them. This is because an outer join returns both matching rows and the non-matching rows on one side of the join.

The first and the third SQL statements would produce exactly the same results, but the first would probably be slower than the third if the dept_id column in the emp table were indexed (depending on the sizes of the tables).

The first SQL statement can not use any indexes - the where clause of the sub query is just checking for NOT NULL rows - so a full table scan would be performed. Also the sub query would be executed once for each row in the dept table.

On the other hand, the sub query in the third example can use the index and since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.


We must know, we will know. -- David Hilbert
Edwin Keeton
Ranch Hand

Joined: Jul 10, 2002
Posts: 214

Database performance can be very counter-intuitive. Also the way something works in one RDBMS may not be the same in another.

It seems that the point of the example you are looking at is that non-correlated subqueries are not necessarily always faster than correlated subqueries, although it would seem they should since they usually only execute once per statement. I think the example you show is more of an exception than a general rule.

The difference in performance here is that the non-correlated subquery is doing table scans, which of course are generally expensive. (I don't understand the explanation that this subquery executes once for each row of the dept table. It seems it should only execute once.)

The correlated subquery, on the other hand, is a covered query, meaning that it only uses columns that are indexed, and therefore the table itself does not have to be read. All the information needed is in the index. Reading indexes is (usually) a lot faster than reading tables. So that's why the correlated subquery, in this instance, is supposedly faster.

I would still want to see proof that it is really faster on my live database. Other factors, such as the size of the tables might make more of a difference than how your query is structured. And again, what works on Oracle, for example, might not work the same way in MySql or DB2.


SCJP, SCWCD
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: SQL question