aspose file tools*
The moose likes JDBC and the fly likes select vs count(*) - Performance Comparison Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "select vs count(*) - Performance Comparison" Watch "select vs count(*) - Performance Comparison" New topic
Author

select vs count(*) - Performance Comparison

Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 10102
    
165

Hi,
We have an application which involves querying the database regularly. Currently we are looking at improving the performance of the application. We have certain scenario where we have check on a table to see whether it contains a row for a given criteria. Consider the following table(just an example. The real table may be far more complex):

Person
---------
Id - Integer
First Name - Varchar
Last Name - Varchar
Occupation - Varchar

Consider the following scenario, we want to check whether there is a row for a given combination of firstname and lastname. If not present then, insert a row. Our current logic involves:

- Retrieve the resultset by firing: select * from Person where firstname='abc' and lastname='xyz'
- Check the size of the resultset. If size is zero then insert the row, else do nothing.

The number of rows in the table may be huge. Instead of this query, we are planning to use the following query, *to improve the performance*:

- select count(*) from Person where firstname='abc' and lastname='xyz'

Is this going to help in improving the performance. Is there any better way of handling this?
We are currently using MSSQL Server(But, i dont think it should matter which server we are using)


[My Blog] [JavaRanch Journal]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

If you expect count(*) to be approximately 1 then I wouldn't think there would be any difference at all. But the answer to all questions of the form "I have a complex system that I don't understand and I want to know if A is faster than B" is always "Try it and see what happens".
Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 10102
    
165

If you expect count(*) to be approximately 1 then I wouldn't think there would be any difference at all.

Count can reach upto 100 records.

But the answer to all questions of the form "I have a complex system that I don't understand and I want to know if A is faster than B" is always "Try it and see what happens".


Surely, we are going to do that

To do that we need to change quite a bit of code, so just wanted to make sure that we are atleast trying out a right approach
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Originally posted by jaikiran pai:
To do that we need to change quite a bit of code, so just wanted to make sure that we are atleast trying out a right approach
Really? Wouldn't it be easier to just write some small programs specifically for testing the options? I don't see them requiring more than about 25 lines of code with all exception-handling included.
steve souza
Ranch Hand

Joined: Jun 26, 2002
Posts: 861
Both approaches would have to run the query for all the records even though you only care about existance. I would do the query as an existance check.

It depends on your server how to do that. What rdbms are you using?

Also, if you use the jamon proxy jdbc driver all of this will be timed automatically and you can compare perforamnce of the approaches. You can use the jamon jdbc proxy simply by chaging your jdbc url (i.e. no code changes). A link follows:

http://jamonapi.sourceforge.net/#WhatsNew22


http://www.jamonapi.com/ - a fast, free open source performance tuning api.
JavaRanch Performance FAQ
Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 10102
    
165

Paul, i will certainly try it out with a sample piece of code. Steve, thanks for that link. Will certainly have a look at it.
Kiran Joshi
Ranch Hand

Joined: Sep 04, 2005
Posts: 54
Probably you will get more benefit from something like the following

select * from Person
where firstname='abc' and lastname='xyz'
and rownum =1

This will return only the first row.
You can try it out.
Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 10102
    
165

The rowcount is available only in Oracle.
Peter Rooke
Ranch Hand

Joined: Oct 21, 2004
Posts: 803

Could you use a stored procedure, do you have a clustered index (on First Name and Last Name). Which database are you using?


Regards Pete
jay ceebee
Greenhorn

Joined: Sep 01, 2006
Posts: 3
will there be instances when you will allow duplicates for firstname lastname combination? from what you mentioned, seems like you won't be allowing any duplicated firstname lastname combi. if this is the case, wouldn't it be simple to just add a unique constraint on both firstname and lastname combi, then simply let your program handle the exception thrown when inserting a duplicate entry.
Jason Liao
Ranch Hand

Joined: Jun 17, 2005
Posts: 59
Originally posted by jay ceebee:
if this is the case, wouldn't it be simple to just add a unique constraint on both firstname and lastname combi, then simply let your program handle the exception thrown when inserting a duplicate entry.


This is a better solution than stored procedure and works for most RDBMS.

Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30537
    
150

count(*) is almost always faster than select * as there is less network traffic involved. For one row the difference wouldn't be significant unless you have LOB fields in the table.

Currently we are looking at improving the performance of the application.

Before optimizing this SQL query, do you know that it is the bottleneck. I would think it is more likely that something else is. It's a pretty simple query that tweaking the result type is unlikely to help. If the actual query takes long, look into adding an index on the name fields.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 10102
    
165

count(*) is almost always faster than select * as there is less network traffic involved. For one row the difference wouldn't be significant unless you have LOB fields in the table.


We carried out a small test on a sample database and the count(*) proved to be a bit more efficient than select *, even though the difference was not very much. We had around 200 rows in that table.

Before optimizing this SQL query, do you know that it is the bottleneck. I would think it is more likely that something else is. It's a pretty simple query that tweaking the result type is unlikely to help.


The reason why we are trying to do something about this query is that we have the following logic in place:



As can be seen, we have loop which is executed around 60000 times(this is a variable value and 60000 is about the maximum). In this loop the logic implemented is to check for the existence of a row and if the row exists then nothing is done else some operation is carried out.

The result set size most of the times is around 100 and get retrieved as part of the above logic even though they never get used. So we thought it would be better to change the query to something else as we are only interested in knowing the existence of the row.

We have heard about IF EXISTS query in MSSQL but we would like to have it to be as generic as possible so that it does not fail on other databases. Thats the reason we thought about count(*) query.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30537
    
150

Jaikiran,
That makes sense. Since it is executed so many times, even a small query can become the bottleneck easily. And yes, it is good to prefer count(*) over isExists.
Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 10102
    
165

Thanks Jeanne Boyarsky, and all others for sharing your thoughts on this.

-Jaikiran
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: select vs count(*) - Performance Comparison