aspose file tools*
The moose likes JDBC and the fly likes need 2 rows rather than 5 from this query Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "need 2 rows rather than 5 from this query" Watch "need 2 rows rather than 5 from this query" New topic
Author

need 2 rows rather than 5 from this query

Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9046
    
  10
When I run this query,

I get 5 rows -- 2 rows for 7/21/2007 reportingPeriod and 3 rows for 7/28/2007 reportingPeriod for the same machineName.

What I want is 2 rows,
1) one row for 7/21/2007 (the row where lastBackupStatus = Completed)
2) one row for 7/28/2007 (the row where lastBackupStatus = Completed)

I'm thinking I need some sort of subquery, but I'm having trouble coming up with the algorithm that will get me what I want.

Can you help?
[ July 25, 2007: Message edited by: Marilyn de Queiroz ]

JavaBeginnersFaq
"Yesterday is history, tomorrow is a mystery, and today is a gift; that's why they call it the present." Eleanor Roosevelt
Abhishek Asthana
Ranch Hand

Joined: Sep 08, 2004
Posts: 146
On what criteria you want those two rows?
Status being 'COMPLETED'? If yes then add this condition in your query's where clause.
Otherwise, please mention the requirement in more detail.
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9046
    
  10
In this case each of the reportingPeriods for this machineName has at least one case where status is "Completed", but that will not always be the case. I don't want to exclude those which have only other statuses, but I only want one instance of machine for each reportingPeriod.

If more than one is status Completed, I want the most recent of those.

A status of Completed within a reportingPeriod will override any possible failures later in the same reportingPeriod.

If none of the rows for that machineName for that reportingPeriod has a status of Completed, I still want (only) one instance for that reportingPeriod, maybe the most recent backup time.
[ July 26, 2007: Message edited by: Marilyn de Queiroz ]
subodh gupta
Ranch Hand

Joined: Jul 23, 2007
Posts: 203
try distinct function on date.


http://subodh-gupta.blogspot.com
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1746
    
    2
but I only want one instance of machine for each reportingPeriod.

Aha! That tells me essentially that you want to write a query that operates on some "groups" that you create -- groups consisting of a machineName/reportingPeriod combination. Generally, there are (at least) two ways of doing this: GROUP BY or correlated subqueries. I'll take the correlated subquery approach. With correlated subqueries you essentially create your groups by carefully choosing which fields in the subquery you tie back to the main query. In this case it will be machineName and reportingPeriod.

If more than one is status Completed, I want the most recent of those.
A status of Completed within a reportingPeriod will override any possible failures later in the same reportingPeriod.


"Most recent" is generally easiest to do with some max date logic and a correlated subquery. "Most recent completed" tells me that the qualification "lastBackupStatus = Completed" must be inside the subquery. (In this case you can have it both inside and outside, but I don't think it changes the results.)

First shot:

SELECT v.*
FROM veritas AS v
WHERE v.lastBackupTime = ( select max( v2.lastBackupTime)
from veritas v2
where v2.machineName = v.machineName
and v2.reportingPeriod = v.reportingPeriod
and v2.lastBackupStatus = 'Completed')

But wait! You said: In this case each of the reportingPeriods for this machineName has at least one case where status is "Completed", but that will not always be the case. <...> I don't want to exclude those which have only other statuses <...>If none of the rows for that machineName for that reportingPeriod has a status of Completed, I still want (only) one instance for that reportingPeriod, maybe the most recent backup time.

Since you didn't provide data for this scenario, I made up my own. There are some problems with that, though. I don't know when in your data lastBackupTime is populated or not. Your "Canceled by admin" that has a lastBackupTime suggest that the field gets populated if a backup is at least attempted. But "Never" backed up suggests it can also happily be null. So I see three flavors where machineName/reportingPeriod combos do not have at least one status of completed:

(1) (my 7/15 data) -- lastBackupTime is nonnull for all rows
(2) (my 7/12 data) -- lastBackupTime is null for all rows
(3) (my 7/10 data) -- lastBackupTime is null for some, nonnull for others



The first case (7/15 data) is easy. We can use your requirement "maybe the most recent backup time".

Second shot:

SELECT v.*
FROM veritas AS v
WHERE v.lastBackupTime = ( select max( v2.lastBackupTime)
from veritas v2
where v2.machineName = v.machineName
and v2.reportingPeriod = v.reportingPeriod
and v2.lastBackupStatus = 'Completed')
UNION SELECT v3.*
FROM veritas AS v3
WHERE v3.lastBackupTime = ( select max (v4.lastBackupTime)
from veritas v4
where v4.machineName = v3.machineName
and v4.reportingPeriod = v3.reportingPeriod)
and not exists ( select 'x' from veritas v5 where v5.machineName = v3.machineName
and v5.reportingPeriod = v3.reportingPeriod
and v5.lastBackupStatus = 'Completed')
ORDER BY v.machineName, v.reportingPeriod;

So I unioned in a second select that handles the (7/15) scenario. It selects the most recent row for those machineName/reporting combinations that do not happen to have any completed backups.

(If it bothers you that the first select and second select *may* not be mutually exclusive, you can add a condition v.lastBackupStatus = 'Completed' to the outer select in the first half of the union.)

But this query doesn't select in any of the 7/12 data. (How could we ever know the most recent if all dates are null?) And for the 7/10 data it selects in the 7/10 lastBackupTime because the max() function considers any date greater than null.

I don't even know if my 7/10 and 7/12 mockup data is realistic, so I won't try to handle it.

This, at least, gives you a starting point, I think.

Something about the data you provide though leaves me kind of squirmy. What does it mean to have a populated "lastCompletedBackupTime" and a lastBackupStatus of "Client computer not responding"? Why do you even need both a "lastBackupTime" and "lastCompletedBackupTime" if the status tells you whether it's completed or not? How many "Never backed up" rows get stacked up? Etc.
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9046
    
  10
try distinct function on date.
That won't work in this case because I can't distinct on only one item without using distinct on the group. Since all rows are different, I'd still get 5 rows.
Marc Peabody
pie sneak
Sheriff

Joined: Feb 05, 2003
Posts: 4727

My solution looks like Michael's "second shot".


A good workman is known by his tools.
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9046
    
  10
[MM] Aha! That tells me essentially that you want to write a query that operates on some "groups" that you create -- groups consisting of a machineName/reportingPeriod combination. Generally, there are (at least) two ways of doing this: GROUP BY or correlated subqueries. I'll take the correlated subquery approach. With correlated subqueries you essentially create your groups by carefully choosing which fields in the subquery you tie back to the main query. In this case it will be machineName and reportingPeriod.

This is the line of reasoning I was trying to follow.

Since you didn't provide data for this scenario,



(1) (my 7/15 data) -- lastBackupTime is nonnull for all rows

This is probably true for most machineNames.

(2) (my 7/12 data) -- lastBackupTime is null for all rows

Hmmm. I guess that might be possible. Maybe they just got the machine the day the reporting period ends, so they only have one row and lastBackupTime is null.

More research seems to indicate that every time lastBackupTime is null, the status is "Never backed up"

(3) (my 7/10 data) -- lastBackupTime is null for some, nonnull for others

This is quite probable.

But this query doesn't select in any of the 7/12 data. (How could we ever know the most recent if all dates are null?)

True, but we still need to list those machines (once each) in the resultset.

And for the 7/10 data it selects in the 7/10 lastBackupTime because the max() function considers any date greater than null.

This, at least, gives you a starting point, I think.


Yep.

What does it mean to have a populated "lastCompletedBackupTime" and a lastBackupStatus of "Client computer not responding"?

I think it just copies the "lastCompletedBackupTime" from the previous time the backup completed successfully even tho this one didn't.

Why do you even need both a "lastBackupTime" and "lastCompletedBackupTime" if the status tells you whether it's completed or not?

Good question! I probably really don't care about the "lastBackupTime" at all. What I really care about from this resultset is the lastCompletedBackupTime, the status, and the reportPeriod, but if there is no lastCompletedBackupTime, I still need to list the machine (once). It doesn't really matter which instance of the machine within the reportPeriod I choose when the status is not Complete.

How many "Never backed up" rows get stacked up? Etc.

I'm not 100% on this, but there appear to be only one per reportingPeriod per machineName.
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17250
    
    6

Michael definitely is up to the issue and solutions.

I do propose that there seems to be "rules" on which one record you want for the single record on that date. Sometimes, and just sometimes, it is actually easier to create a Stored Procedure to do the work, so you can split it up into multiple queries and piece it together through code that run the rules.

Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Michael Matola
whippersnapper
Ranch Hand

Joined: Mar 25, 2001
Posts: 1746
    
    2
Mark Spritzler:

Sometimes, and just sometimes, it is actually easier to create a Stored Procedure to do the work, so you can split it up into multiple queries and piece it together through code that run the rules.


Definitely. If not a stored procedure, then just select more data than you actually need and do the further processing or weeding out in whatever language you're working in.

Marilyn's posted some queries here that I believe have had the requirement "do it all in one SQL statement" (hope I'm not misrepresenting Marilyn) and that have had embedded in them one special case after another. (If the data's like this, present it this way, if it's like that, do this instead, and oh, if it's like this, transform it it to be like this, etc.) In my response, I was just running with that approach. But I'm also in agreement with Mark; I'd be the person asking "do I *really* have to do this in one single query?"
[ July 27, 2007: Message edited by: Michael Matola ]
Marilyn de Queiroz
Sheriff

Joined: Jul 22, 2000
Posts: 9046
    
  10
Thanks, everyone, for your input. You are correct, Mike. I don't have access to stored procedures in the present circumstances, so I need to handle stuff in a complex sql statement. I think I have this resolved. Thank you again to everyone.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: need 2 rows rather than 5 from this query