aspose file tools*
The moose likes JDBC and the fly likes Performance hit using Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Performance hit using "where" clause in the query" Watch "Performance hit using "where" clause in the query" New topic
Author

Performance hit using "where" clause in the query

Raj Murthi
Greenhorn

Joined: Aug 29, 2005
Posts: 14
Hi All,
I am facing a huge performance hit in the java code when using "where" clause in queries. Following are the details:

1. SELECT * FROM Employee

2. SELECT * FROM Employee where employeeid in (26,200,330,571,618,945)

There is no difference in Query Execution Time for both queries.
Business Logic Time is huge in second case as compared to first one (ratio - 1:20).
Rows returned are more in first case as compared to second case.(ratio - 1:4)

Business Logic is same for both the cases where I iterate through the ResultSet, get the objects and set them in a data structure.

Does anybody know the reason of unexpected time difference for the business logic in the second case?

[ February 27, 2006: Message edited by: Raj Murthi ]
[ February 27, 2006: Message edited by: Raj Murthi ]
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

Show us the business logic.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
stu derby
Ranch Hand

Joined: Dec 15, 2005
Posts: 333
Query times are dominated by the amount of data searched through to find your rows.

If you don't have any indexes on your EMPLOYEE table, then find 1 particular row is going to be about as much work for the databse as finding all rows, because the databsae has to look through all rows to find the 1. You can reduce the amount of work the database does by adding appropriate indexes, for example:


Appropriate indexes can dramatically speed up query times. However, they are not free. Besides taking up additional storage space in the database, which can sometimes be a very significant fraction, they will also slightly slow down INSERT and sometimes UPDATE statements.
Raj Murthi
Greenhorn

Joined: Aug 29, 2005
Posts: 14
Thanks for replies.

Here is the sample code, this is not the actual buisness logic but it goes on same lines. [I am giving sample tables and columns here]

long t1 = System.currentTimeMillis();
ResultSet rs = stmtqry.executeQuery(SQL_QUERY);
System.out.println("Executing time for query: "+ ((System.currentTimeMillis()-t1))/1000.0);

t1 = System.currentTimeMillis();
while(rs.next())
{
for(int i=1;i<=NO_COLUMNS;i++)
STORE_ARRAY[ROW_COUNT][COLUMN_COUNT]=rs.getObject(i);
ROW_COUNT++;
}
System.out.println("buisness logic Execution time: "+ ((System.currentTimeMillis()-t1))/1000.0);

employeeid has non-clustered index, but its having composite index
create index employeeid1 on employee(employeeid,city)

So is the composite index the culprit, and should i go ahead and create one more index on employeeid. Also should it be clustered or non-clustered?
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30764
    
156

Raj,
Yes, you should add another index. The current one isn't being used since your query doesn't search by city too.

If I'm remembering right, you can only have one clustered index per table.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Jon Egan
Ranch Hand

Joined: Mar 24, 2004
Posts: 83
A few thoughts about the query execution time:

1. It seems to me, only a really dumb DB engine should care that the index has a secondary column... the column specified in the WHERE clause is first in the index, which means the index (an ordered set of row pointers) is ordered first on that column, then (where the first column gives duplicates) on another column.

2. If there are not an appreciable number of rows in the database, the engine may (rightly) not bother with the index.... it is sometimes faster to do a sequential table scan than to hop back and forth from a small index to a small table.

3. The same point applies as in #2 if the database thinks there are not many rows in the table... in order for it to make an informed decision about the use (or not) of the index, it has to have up-to-date statistics. For Oracle, that means you need to analyze the table. For Informix, it's "update statistics". For MS SQL, it's probably control-F1-right-click-the-pretty-button.

4. Sometimes, the "value in (val1, val2, val3)" can be enough to convince an engine to skip the index - I'm not clear as to why, if the list is small, the engine doesn't just translate it to the equivalent "(value = val1 or value = val2 or value = val3)", which has sometimes (anecdotaly) seemed to use an index when the IN clause has not.

5. Some (most? I don't know...) DBs have an option that will give you some debugging info about how the DB engine will process the statement, along with an estimated "cost". The decision for a certain approach over another is based on the cost, and the cost is based on the accuracy of the statistics.... see #3.


Then there's the other point, based on the original post:


There is no difference in Query Execution Time for both queries.


... then, why are we all discussing query execution time?


Business Logic Time is huge in second case as compared to first one (ratio - 1:20).
Rows returned are more in first case as compared to second case.(ratio - 1:4)


6. Unless the problem is truly in the business logic (whatever the real code is in place of STORE_ARRAY[x][y] = rs.getObject(i)), I can't see any reason why returning fewer rows should result in longer operation iterating through the ResultSet. If anything, the query execution time should differ, and the larger dataset should result in more "pages" from the DB, but the smaller should run faster in the java code. Can you verify that is really what you meant to say? That the smaller result set is actually taking 20 times as long to be processed in the java code?


Hope any part of this helps...
-- Jon
Raj Murthi
Greenhorn

Joined: Aug 29, 2005
Posts: 14
Thanks Jeanne and Jon. This is really becoming interesting.
Jon - I found your reply very useful.
Jeanne - I haven't created the single column index yet, just waiting on conclusion, which I am sure I will get.

Let me ask one more question which can make the issue crystal clear.
Once the query gets executed with statement.executQuery() and Resultset object is returned.

1. Then what happens when Resultset.next() is executed. [Can anyone point me
some documents,links]
First I thought that rs.next is taking lot of time but that is not the
case.
2. Now, what I found is rs.getObject(i) is taking lot of time in "in" clause
query(2nd case) whereas in first case _rs.getObject(i)_ is not time
consuming.
[No of columns are 25]
[I am using _jconn2_ driver]
Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2578

Try analyzing a bit with query optimizer for RAW sql, if things are acceptable there look in the Java code. These things can get really interesting...and I think John's reply almost covers it all.

- Manish
Raj Murthi
Greenhorn

Joined: Aug 29, 2005
Posts: 14
I think in my previous reply I tried to explain where the bottleneck is.
Just to update I am using sybase database.I am trying to work with query optimizer.
Raj Murthi
Greenhorn

Joined: Aug 29, 2005
Posts: 14
The getObject method gives me different timings. There are 25 columns in total and 10 rows for "where" clause query[2nd case].
getObject for one column takes more time in first iteration and then in second iteration some other column is taking more time which took less time in 1st iteration. [Best of 3 executions]

Also the queries which I have given are really simple, since I can't give the actual ones. But one thing I can say is that I am executing some view and then using the "where" clause on the total results that are returned from the view.
This may also have some effect. I am thinking on all possible things which can turn out as bottleneck. Thats why you all must be thinking that why this guy doesn't ask all questions at one go. But its getting difficult to find actual bottleneck.

[ March 02, 2006: Message edited by: Raj Murthi ]
[ March 02, 2006: Message edited by: Raj Murthi ]
Raj Murthi
Greenhorn

Joined: Aug 29, 2005
Posts: 14
Any Updates on this front...
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Performance hit using "where" clause in the query