• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

retrieve data from huge result set

 
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a result set with 207,000 rows. Each ticket may be associated with one or more rows. I need to count the number of rows for each ticket and find the row for each ticket which has the most recent timestamp.

I'm thinking that I should let the database do the sorting (use "order by") because it seems like that result set is gonna take a lot of memory, but I'm wondering if I might be able to do something in addition to speed up the process.
[ June 03, 2006: Message edited by: Marilyn de Queiroz ]
 
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I guess you know database-indicees?
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I know that database indices exist and that a runstat/reorg will speed up the sql call, but how will that help me find the last row for each ticket.

Oh, I just had a thought, what if I compare each row's ticket number to the next row's ticket number. If they're different, the current row must be the last row for that ticket.

Or is there a better way?
 
author & internet detective
Posts: 41878
909
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Marilyn,
If you use a desc (descending) index, the query for finding the last call will go much faster.

If your database supports a mechansim to only get the first (really the last since the sort is descending) row, the time will go down a lot for reading the data and transporting it across the network.

For one ticket, it would be something like:
select a.* from table a where exists (

For one ticket # it would be:
select *
from table
for numRows = 1 <-- syntax here varies greatly by database

For multiple tickets, you would need to put this in a subquery. I'm not sure how to do that part. Feel free to move this to JDBC to get more help
 
Jeanne Boyarsky
author & internet detective
Posts: 41878
909
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Marilyn de Queiroz:
Oh, I just had a thought, what if I compare each row's ticket number to the next row's ticket number. If they're different, the current row must be the last row for that ticket.


I'm not sure how that helps you. You still need the sort. And even if you realize you can stop going though records, you need to read the whole result set for the next ticket. I may be missing the point on this thought though.

I had another thought too. You may well get faster response time if you separate the two problems into two queries. You can efficiently get the # records of each ticket in one query. Trying to get the count and max time is going to be one monster query which will be hard for the database to optimize.
 
Stefan Wagner
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
not doing much sql the last years, but in former times, I would have tried:

Isn't that correct?
Or:

after already modifying and correctin the sql - doesn't this work too:


[ June 03, 2006: Message edited by: Stefan Wagner ]
[ June 03, 2006: Message edited by: Stefan Wagner ]
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Originally posted by Jeanne Boyarsky:
I'm not sure how that helps you. You still need the sort. And even if you realize you can stop going though records, you need to read the whole result set for the next ticket. I may be missing the point on this thought though.


Yes I was thinking of doing that after the sort. However, the descending sort and getting the first line sounds even better.

I had another thought too. You may well get faster response time if you separate the two problems into two queries. You can efficiently get the # records of each ticket in one query. Trying to get the count and max time is going to be one monster query which will be hard for the database to optimize.

I think using 2 queries is not an option for me at this time.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For the other part of the query (besides the count part), let me try to clarify a little bit.

I need to do something like:
for each PROBLEM_ID
select
the first 500 chars from most recent ENTRY of the current PROBLEM_ID where ENTRY_TYPE='ADD NOTE'
(each problem_id can have 0 to an infinity 'ADD NOTE' entries)

I'm using a DB2 database. DB2 OS/390 7.1.1
[ June 03, 2006: Message edited by: Marilyn de Queiroz ]
 
Jeanne Boyarsky
author & internet detective
Posts: 41878
909
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Stefan,
Your query is perfect for getting the maximum timestamp. Since Marilyn needs other columns in the row with the maximum timestamp, the query gets a lot more complicated.
 
Jeanne Boyarsky
author & internet detective
Posts: 41878
909
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Marilyn,
db2 is good about letting you nest queries which helps in general (not necessarily here.)

The syntax for getting the first row is "for fetch first 1 rows only". However, I believe this can only be used at the end of a query, not in a subquery. I'm not positive though and don't have access to db2 at the moment to try it out.

Another alternative is to use a stored procedure if this is allowed in your architecture.
 
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
" because it seems like that result set is gonna take a lot of memory, but I'm wondering if I might be able to do something in addition to speed up the process."

Why do you think so?

ResultSet doesn't bring all datas at once.It stores with fetch size amount of data.aAd with every next() it takes one from fetched datas.If it comes to the end of the fetch,then it goes to db and fetches next.

Good news is that you dont have to worry about the memory usage

Bad news is ResultSet doesnt know how many records the sql brought.Because of its nature i told you above.
Therefore you can not know how many rows brought.
[ June 05, 2006: Message edited by: sinasi susam ]
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Any way I can know what the fetch size is? When I run it in a tool, I can set it to only display the first 100 rows or whatever, but when I run it from my app, how does that work?

Jeanne, thank you. I'm going to try that today.
 
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
but how will that help me find the last row for each ticket.

Marilyn,

I've fielded questions a couple times here about this kind of date logic in SQL. Take a look at

this

and this.

(I did a post once too giving several reasons why the correlated subquery approach is superior to a different approach, but I can't see to find it.)

Anyhow, if you can't get the date logic to work after reading those old posts, post your table structure (including keys) and some sample data and results. That'll make it easier to write the query.

[Edited to fix broken links. -MM]
[ June 05, 2006: Message edited by: Michael Matola ]
 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And here's the post where I defend the correlated subquery approach.
 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Trying to get the count and max time

(and retrieve the fields in the row with the max timestamp)

is going to be one monster query which will be hard for the database to optimize.

It's nicer to have this split into two queries (one for the count another for the max row), but it can be done in one.



As for performance, I'd say test it and see.
[ June 05, 2006: Message edited by: Michael Matola ]
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In table named problem_history, I have
problem_id, entry_date, entry_time, entry_type, entry
for input.

I need a table consisting of only three columns:
problem_id,
entry (if entry_type is 'ADD NOTE', get the most recent entry),
count of rows where entry_type is 'Call Attached'

I can't have more than one row per problem_id and I can't have more than these three columns in my final table.

"fetch first 1 rows only" doesn't seem to work because it doesn't do it on a "per problem_id" basis. I was already working on the subqueries, but I'm not getting very far very fast.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here is one problem_id



Each row is another problem_id.
[ June 05, 2006: Message edited by: Marilyn de Queiroz ]
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This query works (sort of)


I get


It looks like I got the correct number of "Call Attached" calls, but I'm still getting too many "ADD NOTE" entries, one row for each entry rather than just the most recent.
[ June 14, 2006: Message edited by: Marilyn de Queiroz ]
 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
OK, you've given a lot more to go on, but still not everything.

In table named problem_history, I have
problem_id, entry_date, entry_time, entry_type, entry
for input.


OK. (It's real important to know that you've have entry_date and entry_time as two separate fields and not a single date/time field. That affects the date logic.)

I need a table consisting of only three columns:
problem_id,
entry (if entry_type is 'ADD NOTE', get the most recent entry),
count of rows where entry_type is 'Call Attached'


OK, but a table? I thought you were just writing a query here? Are you working with temporary tables? Shuttling some data around?

You say "if entry_type is 'ADD NOTE', get the most recent entry". What to do if entry_type is not 'ADD NOTE'? This is really important to know.

I can't have more than one row per problem_id

OK.

Here is one problem_id

Very good.

These 14 rows need to be converted to one row with three columns which would look something like this:

Converted. So if you're writing some conversion process is it really necessary to do all this with a single SQL statement?

I'm not sure what you mean by "keys".

Which field(s) in the table have been set up to uniquely identify a row. It's my assumption based on the table structure and data that you list that the primary key is the combination of three fields: problem_id, entry_date, and entry_time.

...

You've said

entry (if entry_type is 'ADD NOTE', get the most recent entry)

but elsewhere you've said

the first 500 chars from most recent ENTRY of the current PROBLEM_ID where ENTRY_TYPE='ADD NOTE'

Do you mean for each problem_id, find all rows where entry_type = 'ADD NOTE', then find the most recent ENTRY from among that subset?

Or do you mean for each problem_id, find the most recent ENTRY, then do one thing if the ENTRY_TYPE = 'ADD NOTE' and do something else if the ENTRY_TYPE doesn't equal 'ADD NOTE'?

In other words, what to do if this 15th row of data is the absolute most recent:



...

Anyhow, here's something a lot closer given the new info. But I didn't know what to do about the entry_type = 'ADD NOTE', so for now I didn't do anything.




(If the date/time was a single field, we would only have to take the max on that single field, instead of what I've done here: first taking the max date at (A), then taking the max time at (B) for the already-determined max date at (C).)
 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This query works (sort of)

...from pfxtsd.problems p

:shocked:

Where'd the table "problems" come from? What's the story on that?

but I'm still getting too many "ADD NOTE" entries, one row for each entry rather than just the most recent.

You haven't qualified on dates anywhere.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Michael Matola:
I need a table consisting of only three columns:
problem_id,
entry (if entry_type is 'ADD NOTE', get the most recent entry),
count of rows where entry_type is 'Call Attached'


OK, but a table? I thought you were just writing a query here? Are you working with temporary tables? Shuttling some data around?

Ok, not a table, exactly. When I run the query and get the results in a ResultSet, I (actually the code I'm supposed to use) just turn around and write to a csv file (with no other manipulation), but that file looks just like a table with headers, rows and columns. So I'm thinking in terms of writing from one table to another. Not too much of a stretch. I could override that method if I need to and make it have two queries, but I'd rather not if I don't have to.

You say "if entry_type is 'ADD NOTE', get the most recent entry". What to do if entry_type is not 'ADD NOTE'? This is really important to know.

If entry_type is not 'ADD NOTE' and entry_type is not "Call Attached", we can ignore that record. If it is "Call Attached", we count it. If it is 'ADD NOTE', we care only if it is the most recent. If there is no 'ADD NOTE' entry_type for that problem_id, the result is null. Clear as mud?

These 14 rows need to be converted to one row with three columns which would look something like this:

Converted. So if you're writing some conversion process is it really necessary to do all this with a single SQL statement?

Sorry, bad choice of words. The code in the class I'm using only makes one sql query and writes the file based on that query. I'll think about how easy it would be to change that.

I'm not sure what you mean by "keys".

Which field(s) in the table have been set up to uniquely identify a row. It's my assumption based on the table structure and data that you list that the primary key is the combination of three fields: problem_id, entry_date, and entry_time.

I believe that's true although I have been unable to get my describe statement to work on this particular table (appears to be a permissions problem). There is also a column titled probhist_id which is assigned which could also be the primary key. It's a number which means nothing to me.

...

You've said

entry (if entry_type is 'ADD NOTE', get the most recent entry)

but elsewhere you've said

the first 500 chars from most recent ENTRY of the current PROBLEM_ID where ENTRY_TYPE='ADD NOTE'

Do you mean for each problem_id, find all rows where entry_type = 'ADD NOTE', then find the most recent ENTRY from among that subset?


yes.

[B][I]In other words, what to do if this 15th row of data is the absolute most recent:


[/I][/B]

Just add it to the total number of Call_Attached_Totals for this problem_id.

...

[B][I]Anyhow, here's something a lot closer given the new info. But I didn't know what to do about the entry_type = 'ADD NOTE', so for now I didn't do anything.




(If the date/time was a single field, we would only have to take the max on that single field, instead of what I've done here: first taking the max date at (A), then taking the max time at (B) for the already-determined max date at (C).) [/I][/B]

Thank you so much, Mike, I'll try this first thing in the morning.
[ June 06, 2006: Message edited by: Marilyn de Queiroz ]

 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Michael Matola:
This query works (sort of)

...from pfxtsd.problems p



Where'd the table "problems" come from? What's the story on that?

I need the join on the problems table to get the "first_contact_id" and the "close_date" and the "open_date" because this table is shared by clients other than FTT and open_date and close_date are limiting dates for the problem_id's.



LFMDate = LastFullMonth

 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm still confused by some of this, so I'll try to be more clear about what I'm asking.

Let's go ahead and assume my proposed 15th row of data *is* part of the data you're operating on.

MM: You say "if entry_type is 'ADD NOTE', get the most recent entry". What to do if entry_type is not 'ADD NOTE'? This is really important to know.

MdQ: If entry_type is not 'ADD NOTE' and entry_type is not "Call Attached", we can ignore that record. If it is "Call Attached", we count it. If it is 'ADD NOTE', we care only if it is the most recent. If there is no 'ADD NOTE' entry_type for that problem_id, the result is null. Clear as mud?

When you say the result is null what exactly do you have in mind? In databases you can have a column/field value (something stored on a table) that's null. Or you can introduce a null as a column/field value as a result of a query, which is fine. [Results (2) below.]

Or do you mean that if there's an 'ADD NOTE' entry_type for that problem_id, but it's not the most recent overall, no row at all for that problem_id should show up in the results. [Results (3) below.]

MM: Do you mean for each problem_id, find all rows where entry_type = 'ADD NOTE', then find the most recent ENTRY from among that subset?

MdQ: yes.

MM: In other words, what to do if this 15th row of data is the absolute most recent:

MdQ: Just add it to the total number of Call_Attached_Totals for this problem_id.

Again, assuming my proposed 15th row is part of the data, should the results in this case be this:

(1) 6 Call Attached rows for problem_id 12295318. Last_Add_Note is string 14 because it's the most recent row across the set of rows where entry_type is 'ADD NOTE'.



Or this:

(2) 6 Call Attached rows for problem_id 12295318. Last_Add_Note is blank/null/empty (I used <> as a placeholder; we can decide the exact value later) because the most recent (overall) row (15) does not happen to have an entry_type = 'ADD NOTE'. We still want problem_id 12295318 to show up in the results.



Or this:

(3) Problem_id doesn't show up in the results because it's most recent (overall) row (15) does not happen to have an entry_type = 'ADD NOTE'.



Or something else I haven't thought of.
...

Similarly, if there's data like this,



Do you want that problem_id in the results? If so, like this?



Or not in the results at all?

Or is there some guarantee in the data that for every problem_id there is at least one row that has an entry_type = 'ADD NOTE'?
 
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
you can limit the size of featching the records from the table,
below is the link
http://java.sun.com/j2se/1.4.2/docs/guide/jdbc/getstart/resultset.html
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
[B]Again, assuming my proposed 15th row is part of the data, should the results in this case be this:

(1) 6 Call Attached rows for problem_id 12295318. Last_Add_Note is string 14 because it's the most recent row across the set of rows where entry_type is 'ADD NOTE'.

[/B]

I believe this is the one they want.

...

[B]Similarly, if there's data like this,



Do you want that problem_id in the results? If so, like this?

[/B]

That's my understanding

Or is there some guarantee in the data that for every problem_id there is at least one row that has an entry_type = 'ADD NOTE'?

Definitely no guarantee.
[ June 06, 2006: Message edited by: Marilyn de Queiroz ]
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you, Badrinath. I had never noticed that ResultSet method before.
 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think we got the query part worked out in the other thread dealing with this.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I couldn't get the
select a, b, ( select c ) stuff to work. This is what I finally ended up with.


[ June 14, 2006: Message edited by: Marilyn de Queiroz ]
 
Tongue wrestling. It's not what you think. And here, take this tiny ad. You'll need it.
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic