wood burning stoves 2.0*
The moose likes JDBC and the fly likes SQL subquery performs too slowly Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "SQL subquery performs too slowly" Watch "SQL subquery performs too slowly" New topic
Author

SQL subquery performs too slowly

Joachim Rohde
Ranch Hand

Joined: Nov 27, 2006
Posts: 423

I have the following scenario: I need to find all customers who bought at least two certain books. As an example I want to find all users who bought "Sql Book" and "Java Book".
For that I've got the following query:



This works fine and delivers the expected results. But: it's too slowly. I have a quite huge amount of data, so that this query needs more than 200 seconds.
Could anybody help me out, to rewrite this query so that it performs faster?

PS: I have no influence on the database itself, so I can't set any index or modify the database in any other way.
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30752
    
156

Joachim,
What database are you using? In particular, is it one that supports the INTERSECTION keyword? If so, it would likely be faster to do two separate queries and use that operator.

Also, are you calling this query via JDBC. If so, you could just run the two queries (matching SQL book or matching Java book) separately and do the intersection yourself in Java.

The best way to find out where the time is going is really to run a database "explain" command on your query. If it is doing a nested loop, my suggestions will help. If it is spending the time doing a table scan (which sounds more likely from your description), the only thing that will speed things up is an index on boughtBooks. If you can't change the database, you'll need to either talk to the people who can change it or the people who gave you the performance requirement.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Joachim Rohde
Ranch Hand

Joined: Nov 27, 2006
Posts: 423

Thanks for your answer Jeanne. The database is a MySql database which does not seem to support the intersect keyword.

I executed the query directly against the database (with the MySql Monitor), so no JDBC or anything.

I indexed the field on my local database and the query finished within less than one second. I guess that should be good enough to demonstrate the responsible person that an index is needed ;-)

Thanks again.
Paul Campbell
Ranch Hand

Joined: Oct 06, 2007
Posts: 338
This will perform better for you.



The reason why is your query:



Requires a full table scan for each row (the inner query must return a result set that reads all rows in the table before the outer query's where clause can be evaluated)... unless boughtBooks is an indexed column... which is not very likely.

[ November 14, 2008: Message edited by: Paul Campbell ]
[ November 14, 2008: Message edited by: Paul Campbell ]
Joachim Rohde
Ranch Hand

Joined: Nov 27, 2006
Posts: 423

Thanks for your answer Paul. Your query is ca. 10% faster. In the meanwhile I talked to the responsible person and I was told it should not be a problem to set an index.
I've got a question though: what exactly does "select 1" means in your query?
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30752
    
156

Originally posted by Joachim Rohde:
I've got a question though: what exactly does "select 1" means in your query?

Don't look at any of the columns when returning rather than anything in the table. See this page for detailst
Joachim Rohde
Ranch Hand

Joined: Nov 27, 2006
Posts: 423

Thanks for the explanation.
Paul Campbell
Ranch Hand

Joined: Oct 06, 2007
Posts: 338
Originally posted by Joachim Rohde:
Thanks for your answer Paul. Your query is ca. 10% faster. In the meanwhile I talked to the responsible person and I was told it should not be a problem to set an index.
I've got a question though: what exactly does "select 1" means in your query?


Hi Joachim,

Well... there are two answers here. The short answer is the sub-query returns a value of 1 for each row that is true... you could really use anything you like... the general practice is to "Select 1" with an Exists clause.

The more important concept is that the EXISTS clause is a boolean True/False for the sub-query (you can also use NOT EXISTS in a sub-query). The sub-query will evaluate to TRUE if any rows are returned for the sub-query (and TRUE if no rows are returned for NOT EXISTS).

Using a correlated sub-query (a query that has reference to the outer query) is usually faster when you use it in this manner than a non-correlated sub-query. A non-correlated sub-query must also perform a sort when the comparison operator is IN, NOT IN, = ANY, <> ANY, = ALL, or <> ALL.

I can clarify if I rushed through this too quickly... its a nice diversion from what I'm working on right now.
[ November 15, 2008: Message edited by: Paul Campbell ]
Joachim Rohde
Ranch Hand

Joined: Nov 27, 2006
Posts: 423

Originally posted by Paul Campbell:
I can clarify if I rushed through this too quickly... its a nice diversion from what I'm working on right now.


No, your explanation was perfectly clear. So I'm sorry I didn't distract you any further from your work ;-)
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: SQL subquery performs too slowly