• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Query Optimization

 
Ranch Hand
Posts: 143
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have two tables Employee and Department with the following columns:

Employee => EmpId, EmpName, EmpSalary, DeptId
Department => DeptId, DeptName

I want to find out the deptid and average salary of the employees in the department. The employee table has 1 million rows or more. So, need best approach to get the desired result.

Below is the query given:

select deptId, avg(EmpSalary) from Employee group by DeptId;

Can we update this query or use some other approach to handle 1 million rows.

Thanks,
Vaibhav Garg
 
author & internet detective
Posts: 41860
908
Eclipse IDE VI Editor Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Vaibhav,
The first question is whether the query is really too slow. It might not be. While a million rows sounds like a lot, computers are really fast.

Unless you are oversimplifying the problem/table, there isn't going to be anything you can do to the query to make it faster. It is already a very simple query and there aren't a lot of extra fields in the table.

Which brings us back to whether the query is really too slow, how often you run it and how fast you need it to be. Options include caching the results in Java, running it for only one dept id (if you aren't using them all) or creating an aggregation table that stores the average that you update every time any employee has a salary change. Which increases the cost of those updates.
 
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dear Vaibhav,

Is it really slow? If yes then please check whether index created on column from where clause. If its not then create it. It will reduce processing time more than 50%. Moreover this is simple query and if you still want to optimize it then check whether you can use temp where you can get salary and department id in temp and then use avg function on it.

Hope it will work ...
 
Rancher
Posts: 1044
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

pramod mhatre wrote:whether index created on column from where clause.



The cited very simple (straightforward) select statement does not include a "where" clause.

It is however an interesting question whether the presence or absence of an index on the column deptId has an effect on the execution speed of the query.
 
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ivan Jozsef Balazs wrote:It is however an interesting question whether the presence or absence of an index on the column deptId has an effect on the execution speed of the query.


It is, of course, database dependent, but I'd say it doesn't. Generally, the database has to visit all rows to answer the query, and therefore it won't use an index.

What might help in some databases (eg. Oracle) would be a composite index on the deptId and EmpSalary columns. If the table contained significantly more columns than these two, the index might take less space than the whole table, and since all the data needed to answer the query are contained in the index, the DB might read the whole index instead of the table and save some IO operations, thus possibly speeding things up.

Also, this statement is a bit problematic:

pramod mhatre wrote:If yes then please check whether index created on column from where clause. If its not then create it. It will reduce processing time more than 50%.


(Emphasis added)

Even in simple queries with just a WHERE clause, how much an index will help depends on a lot of things, such as selectivity and the total number of rows in the table. I believe that in most databases it is possible to create a scenario in which adding an index makes even such a simple query perform worse.

Indexes are not magical devices that will instantly speed everything up by X percent. Without understanding at least the very basics of how indexes and databases work (and knowing the structure and quantity of the data), it is not possible to say with any certainty whether an index will help or not. I know this very well - at the beginning of my career, I made exactly this mistake, throwing indexes randomly around and being surprised they didn't help.
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Vaibhav G Garg wrote:The employee table has 1 million rows or more...


Really? That would seem to narrow it down to only two possibilities: Walmart, or the National Grid Corporation of China. Which one do you work for?

Winston
 
Please do not shoot the fish in this barrel. But you can shoot at this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic