File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JDBC and the fly likes Retrieve huge result set from DB in chunks Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Retrieve huge result set from DB in chunks" Watch "Retrieve huge result set from DB in chunks" New topic
Author

Retrieve huge result set from DB in chunks

Santosh Ramachandrula
Ranch Hand

Joined: Apr 04, 2004
Posts: 252
Folks,
We have few SQL queries that run/retrieve for 5-8 mins and retrive 1000000 rows+. We have tried to optimize on the DB side as much as possible. Finally I think that we need to get the data in chunks, with the same query get the first 2000 rows, then with a click of next button go to the DB and get the next 2000 rows and so on. Is there a way to do it this way? We are using JDK 1.4.2 and Oracle 8.1

Setting the fetch size would still execture the query completely and drop the rows later on which is not efficient in my case.

Any suggestions/ideas would be of great help.

Thanks,


Thanks,
Santosh
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18655
    
    8

Make sure you use a forward-only result set. And set the fetch size to a reasonable number. Then read the result set and process each record. You don't need any button-clicking at all. I don't use Oracle myself but I expect that simple procedure should stream the rows through your program without overflowing its memory.

You say "not efficient in my case" but I have no idea what that means. You haven't said what problems you are encountering, so I can only guess that you are running out of memory.
Santosh Ramachandrula
Ranch Hand

Joined: Apr 04, 2004
Posts: 252
Paul,
Thanks for the quick reply.

1. "not efficient in my case" --> the queries that I am using they run for really long time and by setting the fetch size to a number I am dropping the rows after executing the query completely (is it not?).

2. Problems that I am encountering are

Users are doing an open ended search and it takes really long time before they see the response page with results (10-15 mins). What I want to do is to execute the query to get the first 2000 rows, get back to the user with results and on demand get the next 2000 rows for that query from DB and so on.

Basically I am trying to improve the response time to user instead of waiting 10-15 mins.

Please let me know if anything is unclear.
Roger Chung-Wee
Ranch Hand

Joined: Sep 29, 2002
Posts: 1683
Are you able to prevent these open ended searches? Most apps don't allow them, prefering to do things like client-side filtering in order to restrict the amount of data requested.


SCJP 1.4, SCWCD 1.3, SCBCD 1.3
stu derby
Ranch Hand

Joined: Dec 15, 2005
Posts: 333
There are really 2 practical approaches I know of to working with very very large result sets and getting through all of them, although they're not appropriate or practical for all scenarios.

The approach I generally prefer is to make use of what we used to call a producer/consumer paradigm (I don't know what the analogous "Design Pattern" is called these days, maybe "filter") to "stream" the data through your process. In this paradigm, you initialize your top-level producer object and pass it to your ultimate consumer, the thing that's spitting out your result. The top-level producer may also be a consumer from lower-level producers; you can chain them up as a sequence of "filters". Initializing the top-level producer object may require you to pass it a lower-level initialized producer, or it may already know everything it needs to in order to initialize any lower-level producers it references; that's a design decision. Either way, you'll ultimately come down to initializing your JDBC stuff: preparing your statement and possibly executing (personally, I prefer delaying execution until the very very last possible moment, at the first ).

Then, once everything is set up, your top consumer just starts iterating through the objects produced by the top producer (although if you have a long chain, you don't want to have both a hasNext() and next() taking you to the bottom twice. Instead, either use an exception on next() or have next() produce a testable object that will signal the end).

The other approach I know, is what you've suggested, the "chunking" approach. In general this doesn't work that well with a relational database for fundamental reasons; RDBMSs are really designed for working with unordered sets in a stateless way and you really need it to be ordered and stateful. With Oracle, you can get the chunks like this:

However, this has 2 problems. The first is that this will sort the full set for each chunk. It has to, because the database is stateless and the data is unordered, but for large data sets, sorting can be a huge amount of work, repeated over and over. The second is that, internally, Oracle is iterating over the first rows of the result to get to the chunk you want; as you go deeper and deeper into the ordered set, it takes longer and longer. (However, it's usually much much faster to discard the rows in the DB than to transfer them to Java and discard them there.)
stu derby
Ranch Hand

Joined: Dec 15, 2005
Posts: 333
Originally posted by Santosh Ram:


Users are doing an open ended search and it takes really long time before they see the response page with results (10-15 mins). What I want to do is to execute the query to get the first 2000 rows, get back to the user with results and on demand get the next 2000 rows for that query from DB and so on.


Oh... You're trying to do pagination. Your original post made it sound like you where wanting to process an entire large ResultSet. That makes an incredibly huge difference in the approach you take...

First, try to design your application so that users can get to only the filtered data of interest, instead of wading through thousands of rows to find what they want.

Second, if the results still need pagination, see this:
http://asktom.oracle.com/pls/ask/f?p=4950:8:7822331362748496905::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:76812348057
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
FWIW: Here are some notes I kept from prior threads on this topic. I just added IV to summarize the "Ask Tom" link above.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Santosh Ramachandrula
Ranch Hand

Joined: Apr 04, 2004
Posts: 252
Roger
Unfortunately in our situtation we can't do much of filtering on the client side and we are being forced to go with open ended searches and it is effecting the response time.

Stu and Stan thanks for your response, am going through the contents of your posting.

Meantime any other ideas/suggestions would be great as "Food for thought"
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18655
    
    8

I didn't realize you were trying to do pagination either. I think it was the claim that you were returning one million rows that led me away from this idea.

So, if you really really need to do pagination of a million records, then the suggestions already provided should help. But I would suggest that if you have a system that forces users to scan through one million records, and there is no chance of filtering those records before the users have to look at them, then you have a design problem.

Are your users really going to look at page 1 of 500, then click "Next" to look at page 2 of 500, then click "Next"... 500 times? If they have to do that then I would say your system is not user-friendly, to put it nicely. But if you still insist there is no way to filter the data and it absolutely requires a human being to look at one million rows of data, then perhaps some other solution might be more practical.

Like for example producing a PDF of the data and allowing the user to download it. That simplifies your job (no ugly trade-offs for pagination) and (more importantly) it simplifies the user's job. The user can then load the PDF and read back and forward as required. They can scan (Control-F?) for the data they are looking for without having to click, click, click to page through the data. Or something like that.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Retrieve huge result set from DB in chunks