aspose file tools*
The moose likes JDBC and the fly likes problem with Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "problem with "group by"" Watch "problem with "group by"" New topic
Author

problem with "group by"

Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9047
    
  10
select ph.problem_id as problem_id, substr(ph.entry, 1, 254) as last_add_note
from pfxtsd.problem_history ph
group by ph.problem_id having ph.entry_type = 'ADD NOTE'
order by problem_id

I need to get the second item in the select, but I only want to group by the problem_id. How can I get around this?


JavaBeginnersFaq
"Yesterday is history, tomorrow is a mystery, and today is a gift; that's why they call it the present." Eleanor Roosevelt
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1749
    
    2
I'm not clear why you're using group by here at all. Won't a simpler

where ph.entry_type = 'ADD NOTE'

give the results you're looking for?

(Group by is used most often when there is an aggregating (grouping) function in your select list. (I tend to use group by without an associated aggregating function only in subqueries.))

I'm not clear on what exactly you mean by

MdQ: I need to get the second item in the select, but I only want to group by the problem_id.

Could you maybe give some sample rows of table data and what you want your query results to look like?
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9047
    
  10
because I want to get the most recent "ADD NOTE" for each ticket, so I want to group by ticket.
Purushoth Thambu
Ranch Hand

Joined: May 24, 2003
Posts: 425
Why just second post alone? What if there is 3 or 'n' ADD NOTES for the same problem id? Don't you have timestamp of each entry?
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9047
    
  10
Originally posted by Marilyn de Queiroz:
select ph.problem_id as problem_id, substr(ph.entry, 1, 254) as last_add_note
from pfxtsd.problem_history ph
group by ph.problem_id having ph.entry_type = 'ADD NOTE'
order by problem_id


Originally posted by Purushothaman Thambu:
Why just second post alone? What if there is 3 or 'n' ADD NOTES for the same problem id? Don't you have timestamp of each entry?

Not the second ADD NOTES, but I want only the problem_id, the most current ADD NOTE, and the number of "Call Attached" entries for each problem_id. So I think I need to group by problem_id. Perhaps it doesn't hurt to also group by "entry" and "count" as well? But it seems to me that I only want to group by problem_id so that I get the latest ADD NOTE for each problem_id and the total count of Call Attached for each problem_id.
Purushoth Thambu
Ranch Hand

Joined: May 24, 2003
Posts: 425
select ph.problem_id as problem_id, substr(ph.entry, 1, 254) as last_add_note
from pfxtsd.problem_history ph
group by ph.problem_id having ph.entry_type = 'ADD NOTE'
order by problem_id


The above sql will never work for one thing. Group By doesn't include ph.entry. My question remains same it's not clear from your post how the notes are distinguished among themselves. Finding count is easier but finding last ADD_NOTE requires record to be distinguished. It will help if you post your table structure with some records if possible.
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1749
    
    2
Making some assumptions based on the other thread:



Not a group by in sight!
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9047
    
  10
Almost. When I run this query:
I get two rows:


maybe because both notes were at the same time, but I just want the ADD NOTE one.
[ June 07, 2006: Message edited by: Marilyn de Queiroz ]
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9047
    
  10
resolved. I just added another and ph.entry_type = 'ADD NOTE' at the end and now it works. Thank you soooo much.
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1749
    
    2
maybe because both notes were at the same time, but I just want the ADD NOTE one.

Yes, exactly. If the two timestamps are identical down to the smallest unit, the original query had no way of chosing between.

I just added another and ph.entry_type = 'ADD NOTE' at the end and now it works.

Always be careful doing that sort of thing. I want to make sure you get the distinction between adding the [some table].entry_type = 'ADD NOTE' into the correlated subqueries or into the main select.

By "at the end" I'm assuming you mean your query now looks like this (adjusting for variations in table and field names):



(1) If you had qualified on entry_type = 'ADD NOTE' *only at (B)* (the main, driving select), your query would have dropped those problem_ids do have an entry_type = 'ADD NOTE' somewhere in the history, but not the most recent row.

(2) If you had qualified on entry_type = 'ADD NOTE' *only at (A)* (within the subqueries), your query would have indeed included all problem_ids that have an entry_type somewhere in the history. (And if all was right in the universe, the timestamps would have been granular enough that you wouldn't have ended up with two "most recent" rows and had to resort to variation (3).

(3) Qualifying on entry_type = 'ADD NOTE' in both places seems to do the trick for you.

Case closed? Almost. Back in the thread next door you did say that you wanted to include in the results those problem_ids that *have never had a row with entry_type = 'ADD NOTE'*. And due to all the qualifications on entry_type = 'ADD NOTE', such problem_ids are being dropped from your query.

I tried several different ways of doing this, and the only thing I could get to work was a union query. Take a deep breath. (Counting logic from other thread has been added back in, for dramatic effect.)



If anything's unclear, just let me know.

Thank you soooo much.

You're welcome. Gotta earn my keep around here, you know.
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9047
    
  10
That wasn't good enough.... Now they want all the tickets in the table, even the ones that don't have an entry_type of 'Call Attached'. included in the results, so I need to ask --

In the very last section:

from problem_history ph
where ph.entry_type != 'ADD NOTE'
and not exists ( select 'dummy'
from problem_history phd
where phd.problem_id = ph.problem_id
and phd.entry_type = 'ADD NOTE' )

what does 'dummy' do?
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1749
    
    2
Sorry for not responding sooner.

That wasn't good enough.... Now they want all the tickets in the table, even the ones that don't have an entry_type of 'Call Attached'. included in the results, so I need to ask --

In the syntax I proposed, in which the count(*) for entry_type = 'Call Attached' is done as inline views in the select lists, the tickets that don't have an entry_type of 'Call Attached" are not being dropped from the results. But you weren't able to get that syntax to work.

So, I'm guessing in the syntax you did get to work, in which the count(*) for entry_type = 'Call Attached' is done as inline views in the from section, those rows are being dropped.

Inconviently, I can't get your syntax to work in my playground database to try some things out.

Here's a stab in the dark: try an outer join. So instead of



try:




In the very last section:

from problem_history ph
where ph.entry_type != 'ADD NOTE'
and not exists ( select 'dummy'
from problem_history phd
where phd.problem_id = ph.problem_id
and phd.entry_type = 'ADD NOTE' )

what does 'dummy' do?


Well, it's just a dummy value, of course!

When you're using NOT EXISTS and the subquery like we are here -- just checking for the nonexistence of a row that meets certain criteria -- you don't really care what field(s) you select from the table in that subquery because they don't get used for anything. The idea of the 'dummy' constant (a common convention is "select 'x'...") is to make it easy on the database engine. It's not very costly for the database engine to process a throwaway constant. If you had a "select problem_id" I'm guessing it wouldn't be very costly either. But if you had "select *" there, it would have to pull in and buffer all the fields on the table, only to throw them away. If it's a table with only a few fields, no big performance hit. If it has many fields, maybe. Perhaps this is premature optimization, but it is a common SQL convention for some subqueries. (Who knows, maybe the major databases now optimize away the "select *" anyway.)
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9047
    
  10
I think I found a key statement in O'Reilly's SQL Pocket Guide. "Recent database releases allow you to embed a subquery in a SELECT list." I think that I'm using a non-recent version (DB2 OS/390 7.1.1) when the current version is 8.x where 'x' is 2 or higher.

MM: "checking for the nonexistence of a row that meets certain criteria"

I think this is what I'm having a hard time wrapping my head around. I keep thinking in terms of "checking for the existance of a row that excludes certain criteria" which is probably slightly different.
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1749
    
    2
MM: "checking for the nonexistence of a row that meets certain criteria"

I think this is what I'm having a hard time wrapping my head around. I keep thinking in terms of "checking for the existance of a row that excludes certain criteria" which is probably slightly different.


Just the other day at work I helped a colleague with a query that might make my approach a little clearer.

He was writing a query for some end users so they could identify some problem cases then fix them. However, the users didn't want to see the already fixed cases. They only wanted the new problems.

For our purposes, tableA.status tells us who the problems are and tableB.status tells us who the fixed ones are. (Fixing the problem consists of inserting the correct status on tableB. The data on tableA does NOT get updated.) (In reality "tableA" was 4 tables with some nasty joins, 3 inners and 1 outer, and the statuses in both cases were really several status flags with NOTs and ANDs and ORs, and several tables had date logic too.)

Anyhow, the solution ended up looking like this:



In other words, give me all the ids who have a status of 'bad bad bad' on tableA and who do NOT ALSO have a status of 'already fixed' on tableB.

In other words, as part of defining what my population of interest IS, I'm also saying what the population IS NOT.

How's this related to your data and query?

Suppose you have the following data:



In the version of the query I posted above on June 7, problem_ids 001, 002, and 003 are getting selected into the results by the first half of the union, and 004 is getting selected by the second half.

What happens if you delete

and not exists ( select 'dummy'
from problem_history phd
where phd.problem_id = ph.problem_id
and phd.entry_type = 'ADD NOTE' )

from the second half of the union? Problem_ids 002 and 003 will get selected twice, once by each half of the union.

Remember, before we created the second half of the union, problem_ids 001, 002, and 003 were being handled correctly by the original query because they all had an entry type of ADD NOTE somewhere in their history (the query was essentially treating all rows for a given problem_id as a group). Then we created the second half of the union to include cases like 004, who never had any ADD NOTEs in their history.

Note that in the second half of the union we couldn't just say select rows where entry_type doesn't equal ADD NOTE (because we'd pull in problem_ids 002 and 003 in again -- they indeed have rows where entry_type doesn't eqal ADD NOTE). We had to say select problem_ids where entry_type doesn't equal ADD NOTE and where that problem_ID doesn't also happen to have rows that *do* equal ADD NOTE.

Read "and not exists ( <subquery> )" as "and doesn't also happen to have."

(There are simple scenarios you can handle with outer joins, but sometimes reversing things with a not exists makes things clearer.)
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: problem with "group by"