wood burning stoves 2.0*
The moose likes JDBC and the fly likes SQL Query Optimization for ORDER BY and LIMIT Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "SQL Query Optimization for ORDER BY and LIMIT" Watch "SQL Query Optimization for ORDER BY and LIMIT" New topic
Author

SQL Query Optimization for ORDER BY and LIMIT

Vishal Baid
Ranch Hand

Joined: Jul 18, 2012
Posts: 42
    
    1
Hi All.

I want to optimize a query

SELECT * FROM EMPLOYEE ORDER BY EMPID ASC LIMIT 9,1.

This Query will give me the 10th LOWEST EMPID in the table.
But if the number of rows is abut 100 million the query have a bad performance.

So please help me on this.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

First of all, how often is the query run, how long it takes now and how much do you need it to improve?

If there isn't an index on the EMPID column, create one. Since you're selecting just a few rows, the database should use the index and it should be quite fast.

The query itself is so simple that there isn't much to optimize. However, you should explicitly name columns instead of specifying SELECT *. It won't help much, but (depending on the database, your indexes and network) it might be a bit faster, especially if you select more than just a few rows and/or there is an index containing all columns you're selecting.
Vishal Baid
Ranch Hand

Joined: Jul 18, 2012
Posts: 42
    
    1
Martin Vajsar wrote:First of all, how often is the query run, how long it takes now and how much do you need it to improve?

If there isn't an index on the EMPID column, create one. Since you're selecting just a few rows, the database should use the index and it should be quite fast.

The query itself is so simple that there isn't much to optimize. However, you should explicitly name columns instead of specifying SELECT *. It won't help much, but (depending on the database, your indexes and network) it might be a bit faster, especially if you select more than just a few rows and/or there is an index containing all columns you're selecting.


Hi Thanks for your reply.

Their EMPID is a primary key. The index is not defined properly. First Employee ID is 1000 the n may be next one is 10010. So no proper indexing.
Problem is that for finding the 10th lowest EMPID, ORDER BY do the ordering of whole table first then only give the 10th row. If the number of rows are 100 Million then it is taking abt 1 mins.
The process takes lot of time. So I want someone give me some idea who to make the query work fast.

Thanks in advance.
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1671
    
  14

Firstly, if there is no index on your EMPID and you are sorting 100 million rows by EMPID, then you need to create an index on EMPID. Basically, if you are going to do a lot of sorting on a column (or columns) then you need to index the column(s). However, if the EMPID has been declared with a primary key constraint, then it should already be indexed - the index is how the database enforces the PK constraint.

After you've done this, there are different ways to achieve your goal of finding the Nth record in a sorted list of records. You could use a Top-N query and just discard the first N-1 rows. Alternatively, if you are on Oracle, you could use an analytic function like RANK or DENSE_RANK to calculate a ranking for each record (based on your sorting criteria), then wrap this query in a simple outer SELECT that just fetches the record where the ranking = N.

But as Martin says, you should avoid doing SELECT *. One option might be to find your chosen EMPID first, without fetching any of the other values for the Employee record, then wrap this SELECT EMPID... in another SELECT that fetches the rest of the Employee data for the single EMPID you are interested in. This should allow your "find the Nth record" query to return the required EMPID using just the index, then the final SELECT to get the rest of the Employee data for the chosen EMPID will be very quick.


No more Blub for me, thank you, Vicar.
James Boswell
Bartender

Joined: Nov 09, 2011
Posts: 1012
    
    5

Their EMPID is a primary key. The index is not defined properly. First Employee ID is 1000 the n may be next one is 10010. So no proper indexing

I think you may be confusing indexes with sequences here. This post may help:
http://www.coderanch.com/t/453208/Oracle-OAS/Difference-betwwn-Indexes-Sequences
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

What database are you using, Vishal? (It certainly isn't Oracle, Oracle doesn't have the LIMIT clause).

It looks like there's a bit of confusion about database indexes, as James already noted. A database index is similar to an index in a book - the index keeps all indexed keys in a sorted structure (corresponding to the keywords in a book index sorted alphabetically), together with an "address" of the row containing the key (a page number in the book index). So to find the 10th lowest key, the database has to find 10th entry in the index (it does so by reading the first ten entries) and locate corresponding row. This should under all circumstances be much quicker than to read (and sort) hundred million rows.

If the EMPID column is a primary key, in most databases it means that it is indexed. I'd also expect the LIMIT clause to use an index. Perhaps someone who knows the database you're using will be able to help.
Vishal Baid
Ranch Hand

Joined: Jul 18, 2012
Posts: 42
    
    1
I am using MYSQL.

EMPID is not auto-incremant. In ofbiz framework, the primary key Empid is random number.
Not in an order that is why i am facing this issue.
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1671
    
  14

Vishal Baid wrote:I am using MYSQL. EMPID is not auto-incremant. In ofbiz framework, the primary key Empid is random number. Not in an order that is why i am facing this issue.

It doesn't matter how the EMPID is generated, what's important is that it is correctly defined as a primary key in MySQL. If this is the case then EMPID should already have a unique index, which means you can use this for your query. Check the MySQL online docs:
MySQL manual wrote:The primary key for a table represents the column or set of columns that you use in your most vital queries. It has an associated index, for fast query performance.

If possible, I would suggest you use a Top-N query with LIMIT and only fetch the EMPIDs, because this can be achieved using just the index, which will be fast. I don't use MySQL, but I think you could get just the record you want by using the offset (N-1) in the LIMIT clause e.g. if you want just the 10th record try using:You'll need to play around with this to make sure you get the record you want. Then wrap this in another SELECT that fetches the Employee details for the single EMPID you are interested in.
 
wood burning stoves
 
subject: SQL Query Optimization for ORDER BY and LIMIT