aspose file tools*
The moose likes JDBC and the fly likes join and in Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "join and in" Watch "join and in" New topic
Author

join and in

Mike Yu
Ranch Hand

Joined: Nov 17, 2001
Posts: 175
If I have two tables test1 and test2, test1 has column col1, col2, ..., test2 has column cola, colb,....
test1.col1 is related to test2.cola.
If I make two queries using join and using in:
select test1.col1, test1.col2 where test1.col1 = test2.cola;
select test1.col1, test1.col2 where test1.col1 in (select test2.cola).
I tried a few cases, they return the same results. I would like to know if this is true in general.


Thanks,<br />Mike
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1752
    
    2
Yep. And there's at least one more such option (without getting *too* crazy):
select test1.col1, test1.col2 from test1 where exists ( select 'x' from test2 where test2.cola = test1.col1 )
If you've got some sort of qualifications on your nondriving table (where test2.colz = 'ZZZ' etc.), there can be performance considerations. (In general the vast majority of SQL I do is ad hoc, so as a rule I almost never worry about SQL performance. And also I don't know much about SQL performance optimization, so I'm the wrong person to ask there. (At my work there's a performance optimization group that will tweak SQL for you.))
So heavy disclaimers on the following:
If you expect your query to return a large percentage of the rows on your driving table, favor joins.
If you expect your query to return a small percentage of the rows on your driving table, favor in/exists.


That said, I personally would favor the join over in/exists in these cases unless I had a good reason not to. Anybody looking at your query would understand the intent of the join.
Mike Yu
Ranch Hand

Joined: Nov 17, 2001
Posts: 175
Hi Michael,
Thank you for your reply.
Can you explain a bit more about "qualifications" and "nondriving table"?
Michael Zalewski
Ranch Hand

Joined: Apr 23, 2002
Posts: 168
In my experience, FWIW, a JOIN is almost always better than a IN, even if there are qualifications on the driving table, and especially if the query returns only a few rows.
A lot faster. In instances where it matters, I see speed increases of 100x for the JOIN (This is on Oracle 8i. I have seen similar improvements in DB2).
Paul Misoni
Greenhorn

Joined: Jan 02, 2001
Posts: 16
Given SQL-92,
select
test1.col1, test1.col2
from
test1,test2
where test1.col1 = test2.cola;
is NOT the same as
select
test1.col1, test1.col2
from test1,test2
where test1.col1 in (select test2.cola from test2);
As a test case, suppose test1 has 7 rows with col1 having values 1 thru 7 and test2 has values in cola of 1,3,5,7,9.
The first case (equijoin) returns 4 rows (1,3,5,7).
The subselect returns 20 rows.
Syntax is VERY important when coding SQL statements. You need to realize what it is exactly that you are asking for.
Otherwise the results may not be quite what you expected.
COL1 COL2
------- ------------------------------
1 one
3 three
5 five
7 seven

4 Row(s) affected

COL1 COL2
------- ------------------------------
1 one
1 one
1 one
1 one
1 one
3 three
3 three
3 three
3 three
3 three
5 five
5 five
5 five
5 five
5 five
7 seven
7 seven
7 seven
7 seven
7 seven

20 Row(s) affected
Michael Zalewski
Ranch Hand

Joined: Apr 23, 2002
Posts: 168
Originally posted by Paul Misoni:
Given SQL-92,
select
test1.col1, test1.col2
from
test1,test2
where test1.col1 = test2.cola;
is NOT the same as
select
test1.col1, test1.col2
from test1,test2
where test1.col1 in (select test2.cola from test2);

But it's a lot closer to

Which is what I believe Mike Yu meant when he said
"they return the same results".
You are right when you say the INNER JOIN and the correlated IN do not always produce identical results. But I think for two tables without using NULLs, and if SELECT [primary-key] FROM Table2 is in the IN clause, that they are in fact identical.
Someone might be able to provide a counter-example.
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1752
    
    2
Let me try to clarify what I was trying to say.
It was my contention that the SQL queries (1), (2), and (3) below produce identical results.
If that's not true, I'd glady stand corrected.
Queries (1) and (2) are based on Mike Yu's original two queries. Neither of Mike Yu's original versions had FROM clauses and therefore weren't valid SQL. It was my assumption -- and I guess I should have stated this in my first post -- that the FROM clause in the first query would be from test1 , test2 and the FROM clause in the second query would be from test1.
(1)

(2)


(3)

Paul Misoni responds with two queries, the first of which is identical to (1) above, and here's the second (which I'll call (4)). Paul's queries are also based on Mike Yu's, but I believe Paul and I have made different assumptions about the FROM clauses.
(4)

Paul provides sample data and sample results for his two queries. I have no issues with Paul's results given his queries.
Paul points out that queries (1) and (4) are not the same. I completely agree. They're not the same. Whoever said they are?
The difference between query (4) and query (2) is the FROM clause. (2)'s from clause only has one table -- test1. But (4)'s from clause has two tables -- test1 and test2 -- and there's no join condition on these two tables in the where clause, so the query produces the Cartesian product of the two tables (then limits the results based on the where clause).
(If you don't believe me that a Cartesian product is being generated, add columns from test2 to the select list of (4).)
As far as I can tell, SQL-92 vs. non-SQL-92 has nothing to do with any of this. (But would gladly be corrected if I'm mistaken.) The difference is in what tables we all envision in the FROM clause. Based on Michael Zalewski's post:

code:
--------------------------------------------------------------------------------
SELECT Test1.col1, Test1.col2 FROM Test1 WHERE Test1.col1 IN ( SELECT Test2.cola FROM Test2 )
--------------------------------------------------------------------------------
Which is what I believe Mike Yu meant when he said
"they return the same results".

I think all three Michaels taking part in this thread were thinking of the same query and results.
By the way, Paul, if you're going to bring up SQL-92, shouldn't you have used INNER JOIN and CROSS JOIN syntax in the where clause in your examples?
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1752
    
    2
Originally posted by Michael Zalewski:
You are right when you say the INNER JOIN and the correlated IN do not always produce identical results.

None of the INs that have been posted have correlated subqueries.
But I think for two tables without using NULLs, and if SELECT [primary-key] FROM Table2 is in the IN clause, that they are in fact identical.

Could you provide more detail on how NULLs or nonkey fields would affect the results?
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1752
    
    2
Can you explain a bit more about "qualifications" and "nondriving table"?
By "qualifications" I meant, roughly, conditions in the where clause that serve to limit the number of rows returned (as opposed to conditions in the where clause that specify which tables are to be joined and/or how).
By "nondriving table" I meant, roughly, any table in the query that is not the most important (driving) table.
Mike Yu
Ranch Hand

Joined: Nov 17, 2001
Posts: 175
Thank you all for your help.
Sorry that the from clause is missing in my original question. The queries should be like:
select test1.col1, test1.col2 from test1, test2 where test1.col1 = test2.cola;
select test1.col1, test1.col2 from test1 where test1.col1 in (select test2.cola from test2).
[ November 16, 2002: Message edited by: Mike Yu ]
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: join and in