Hi, We have an application which involves querying the database regularly. Currently we are looking at improving the performance of the application. We have certain scenario where we have check on a table to see whether it contains a row for a given criteria. Consider the following table(just an example. The real table may be far more complex):
Person --------- Id - Integer First Name - Varchar Last Name - Varchar Occupation - Varchar
Consider the following scenario, we want to check whether there is a row for a given combination of firstname and lastname. If not present then, insert a row. Our current logic involves:
- Retrieve the resultset by firing: select * from Person where firstname='abc' and lastname='xyz' - Check the size of the resultset. If size is zero then insert the row, else do nothing.
The number of rows in the table may be huge. Instead of this query, we are planning to use the following query, *to improve the performance*:
- select count(*) from Person where firstname='abc' and lastname='xyz'
Is this going to help in improving the performance. Is there any better way of handling this? We are currently using MSSQL Server(But, i dont think it should matter which server we are using)
If you expect count(*) to be approximately 1 then I wouldn't think there would be any difference at all. But the answer to all questions of the form "I have a complex system that I don't understand and I want to know if A is faster than B" is always "Try it and see what happens".
If you expect count(*) to be approximately 1 then I wouldn't think there would be any difference at all.
Count can reach upto 100 records.
But the answer to all questions of the form "I have a complex system that I don't understand and I want to know if A is faster than B" is always "Try it and see what happens".
Surely, we are going to do that
To do that we need to change quite a bit of code, so just wanted to make sure that we are atleast trying out a right approach
Originally posted by jaikiran pai: To do that we need to change quite a bit of code, so just wanted to make sure that we are atleast trying out a right approach
Really? Wouldn't it be easier to just write some small programs specifically for testing the options? I don't see them requiring more than about 25 lines of code with all exception-handling included.
Both approaches would have to run the query for all the records even though you only care about existance. I would do the query as an existance check.
It depends on your server how to do that. What rdbms are you using?
Also, if you use the jamon proxy jdbc driver all of this will be timed automatically and you can compare perforamnce of the approaches. You can use the jamon jdbc proxy simply by chaging your jdbc url (i.e. no code changes). A link follows:
will there be instances when you will allow duplicates for firstname lastname combination? from what you mentioned, seems like you won't be allowing any duplicated firstname lastname combi. if this is the case, wouldn't it be simple to just add a unique constraint on both firstname and lastname combi, then simply let your program handle the exception thrown when inserting a duplicate entry.
Originally posted by jay ceebee: if this is the case, wouldn't it be simple to just add a unique constraint on both firstname and lastname combi, then simply let your program handle the exception thrown when inserting a duplicate entry.
This is a better solution than stored procedure and works for most RDBMS.
count(*) is almost always faster than select * as there is less network traffic involved. For one row the difference wouldn't be significant unless you have LOB fields in the table.
Currently we are looking at improving the performance of the application.
Before optimizing this SQL query, do you know that it is the bottleneck. I would think it is more likely that something else is. It's a pretty simple query that tweaking the result type is unlikely to help. If the actual query takes long, look into adding an index on the name fields.
count(*) is almost always faster than select * as there is less network traffic involved. For one row the difference wouldn't be significant unless you have LOB fields in the table.
We carried out a small test on a sample database and the count(*) proved to be a bit more efficient than select *, even though the difference was not very much. We had around 200 rows in that table.
Before optimizing this SQL query, do you know that it is the bottleneck. I would think it is more likely that something else is. It's a pretty simple query that tweaking the result type is unlikely to help.
The reason why we are trying to do something about this query is that we have the following logic in place:
As can be seen, we have loop which is executed around 60000 times(this is a variable value and 60000 is about the maximum). In this loop the logic implemented is to check for the existence of a row and if the row exists then nothing is done else some operation is carried out.
The result set size most of the times is around 100 and get retrieved as part of the above logic even though they never get used. So we thought it would be better to change the query to something else as we are only interested in knowing the existence of the row.
We have heard about IF EXISTS query in MSSQL but we would like to have it to be as generic as possible so that it does not fail on other databases. Thats the reason we thought about count(*) query.
Jaikiran, That makes sense. Since it is executed so many times, even a small query can become the bottleneck easily. And yes, it is good to prefer count(*) over isExists.