File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JDBC and the fly likes Multiple Sub-queries. efficient or not ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Multiple Sub-queries. efficient or not ?" Watch "Multiple Sub-queries. efficient or not ?" New topic
Author

Multiple Sub-queries. efficient or not ?

Sharath Upadhya
Greenhorn

Joined: Dec 01, 2009
Posts: 9
I am having a table called discussions. it contains all the topics as well as comments.

I want to fetch all the topics and their latest comments at once. if there is no comment for a topic, then topic is fetched or else the latest comment.

for example, if there are 19 topics, and 15 topics have comments, then the latest comments of those 15 topics and the remaining 4 un-commented topics are fetched.

I am using multiple sub-queries to fetch them at once. like the one below.

select * from discussions where id in (select id from discussions where parent_discussion is not null group by parent_discussion desc) or id in (select id from discussions where parent_discuss
ion is null and id not in (select distinct parent_discussion from discussions where parent_discussion is not null)) order by created_on desc;

here the first sub-query fetches all the latest comments and the second one fetches all the topics that have no comments.

here's my question to you guys. "How much efficient is this query ? do you have any other way around." Your suggestion would be of a great help. thanks in advance.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 29219
    
135

This is not an efficient query. I also think it doesn't do what you want. From your description of the requirements for the query, you want to select all the discussions plus some other data from another table.

Let's break down your query:

This returns the ids of all discussions with comments.


This returns the ids of all discussions without comments


This turns the ids of all discussions that that are not in the set of discussions with comments. In other words that don't have comments. However, the previous subquery already does that so this is redundant.


This returns all columns from all rows with matching ids. Since your query amounts to "all with comments or all without comments", this returns the whole table. So you could have just written select * from discussions.

It's difficult to talk about performance until you get the functionality right so maybe you can clarify the purpose of the subqueries?


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Sharath Upadhya
Greenhorn

Joined: Dec 01, 2009
Posts: 9
Hi Jeanne Boyarsky, thanks for the reply.

by the way, the query I posted above fulfills my requirement. I assume you had trouble understanding my requirement.

let me break down the query.

This line returns all the records from the discussions table. where.. the condition goes...

whose id includes those records, which have parent discussion(i.e. these are comments). group by desc ensures that only the latest comment per discussion is fetched. and this sub-query fetches only the comments.

the 'OR' specifies that, record can be either a topic or a discussion.

This sub-query fetches all the records, that are topics. (i.e they do not have parent). and also ensures that they do not include any of those topics that are having comments and were fetched in the previous sub-query.

This line sorts all the records by their posting/commenting date with the latest on top.

If you are still not sure about this, take a look at the list of records after I ran this query.


We have 19 topics here, and only 15 topics have comments.

I hope this clarifies your doubts.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 29219
    
135

Let me ask this another way. Does the discussion table not have 19 records? If it has more than that, what isn't being returned?
Sharath Upadhya
Greenhorn

Joined: Dec 01, 2009
Posts: 9
Let me remind you that i am using a single table `discussions` to store both topics as well as their comments.

This `discussions` table contains about 190 records. out of which, 19 records are topics(they do not have parent_discussion). remaining 171 records are comments. we are leaving out all the old comments and fetching only the latest comment per discussion.(the group by clause). and topics themselves if they do not have comments.

so it makes sense to fetch only 19 records.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 29219
    
135

Got it. The query isn't efficient as it involves multiple table scans. It would likely be more efficient to normalize your table structure and not store different types of things in the same table.

Another possible improvement is to split it into multiple queries and get the keys of the records you care about. Then you could limit it to two passes. This may not help enough to be worth it though.
Scott Selikoff
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3697
    
    5

Jeanne's right, you are describing one table to hold topics and comments and this sounds a bit like single table anti-pattern. Add more tables, normalize the data, and the structure will make querying it a lot easier.


My Blog: Down Home Country Coding with Scott Selikoff
Sharath Upadhya
Greenhorn

Joined: Dec 01, 2009
Posts: 9
Thanks a lot `Jeanne Boyarsky` and `Scott Selikoff` for the suggestion.

Suppose, I have made it into 2 tables, one for topics and another for comments, then how would you order them by posted date.

If your answer is 'fetch both comments and topics separately, and then order them both by putting in the list', then wouldn't that be inefficient ?

or is there any way in mysql to merge both columns for the ordering purpose ?

help me people......

thanks
Scott Selikoff
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3697
    
    5

Are you familiar with joins? Why fetch both separately when the database will let you fetch them simultaneously. Let it do the heavy lifting as they say.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Multiple Sub-queries. efficient or not ?
 
Similar Threads
Help with a Hibernate Query
Group by desc in jpql
Statement.setMaxRows
Regarding fetching the records based on a fetchsize limit
Pagination using the database(DB2)