Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Query Optimization

 
Vaibhav G Garg
Ranch Hand
Posts: 143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have two tables Employee and Department with the following columns:

Employee => EmpId, EmpName, EmpSalary, DeptId
Department => DeptId, DeptName

I want to find out the deptid and average salary of the employees in the department. The employee table has 1 million rows or more. So, need best approach to get the desired result.

Below is the query given:

select deptId, avg(EmpSalary) from Employee group by DeptId;

Can we update this query or use some other approach to handle 1 million rows.

Thanks,
Vaibhav Garg
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34408
346
Eclipse IDE Java VI Editor
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Vaibhav,
The first question is whether the query is really too slow. It might not be. While a million rows sounds like a lot, computers are really fast.

Unless you are oversimplifying the problem/table, there isn't going to be anything you can do to the query to make it faster. It is already a very simple query and there aren't a lot of extra fields in the table.

Which brings us back to whether the query is really too slow, how often you run it and how fast you need it to be. Options include caching the results in Java, running it for only one dept id (if you aren't using them all) or creating an aggregation table that stores the average that you update every time any employee has a salary change. Which increases the cost of those updates.
 
pramod mhatre
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear Vaibhav,

Is it really slow? If yes then please check whether index created on column from where clause. If its not then create it. It will reduce processing time more than 50%. Moreover this is simple query and if you still want to optimize it then check whether you can use temp where you can get salary and department id in temp and then use avg function on it.

Hope it will work ...
 
Ivan Jozsef Balazs
Rancher
Posts: 979
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
pramod mhatre wrote:whether index created on column from where clause.


The cited very simple (straightforward) select statement does not include a "where" clause.

It is however an interesting question whether the presence or absence of an index on the column deptId has an effect on the execution speed of the query.
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ivan Jozsef Balazs wrote:It is however an interesting question whether the presence or absence of an index on the column deptId has an effect on the execution speed of the query.

It is, of course, database dependent, but I'd say it doesn't. Generally, the database has to visit all rows to answer the query, and therefore it won't use an index.

What might help in some databases (eg. Oracle) would be a composite index on the deptId and EmpSalary columns. If the table contained significantly more columns than these two, the index might take less space than the whole table, and since all the data needed to answer the query are contained in the index, the DB might read the whole index instead of the table and save some IO operations, thus possibly speeding things up.

Also, this statement is a bit problematic:
pramod mhatre wrote:If yes then please check whether index created on column from where clause. If its not then create it. It will reduce processing time more than 50%.

(Emphasis added)

Even in simple queries with just a WHERE clause, how much an index will help depends on a lot of things, such as selectivity and the total number of rows in the table. I believe that in most databases it is possible to create a scenario in which adding an index makes even such a simple query perform worse.

Indexes are not magical devices that will instantly speed everything up by X percent. Without understanding at least the very basics of how indexes and databases work (and knowing the structure and quantity of the data), it is not possible to say with any certainty whether an index will help or not. I know this very well - at the beginning of my career, I made exactly this mistake, throwing indexes randomly around and being surprised they didn't help.
 
Winston Gutkowski
Bartender
Pie
Posts: 10417
63
Eclipse IDE Hibernate Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Vaibhav G Garg wrote:The employee table has 1 million rows or more...

Really? That would seem to narrow it down to only two possibilities: Walmart, or the National Grid Corporation of China. Which one do you work for?

Winston
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic