It seems that most people are interested in paging for one of two reasons:
1) They want to show only the first N results to the user, and not bother retrieving rows N+1 through 2N until the user asks for them.
2) They are worried about running out of memory if they process all the rows at once.
Neither of these apply to me. My situation is that I have a query that may take a long time to execute but doesn't return all that many rows. Right now I'm testing with a query that takes about 20 minutes and returns maybe 1000 rows. Everything works fine except that the user gets no feedback (except an indeterminate progress bar swishing back and forth) during the 20 minute wait.
If I didn't have to wait the 20 minutes to get the result set, but could instead either
A) ask Statement.execute() to return immediately and behave asynchronously, or
B) ask Statement.execute() to return a ResultSet when a small number of rows (say 1 or 2) are available and then call something like getMoreResults() to get subsequent ResultSets (blocking until a small number of new rows are available).
then I could show some decent feedback. Are either of these possible with recent versions of JDBC?
I did see some posts somewhere where people suggested calling either Statement.setFetchSize() or Statement.setMaxRows() and calling execute() in a loop, but it was my understanding that this shouldn't work. setMaxRows(N) will return a resultSet when N rows are available, but calling execute() twice is likely to return an identical set of rows twice, right? And setFetchSize() is not supposed to change the semantics of anything but simply determines how many rows of data are cached locally, right?
If anyone out there has comments or insights, I'd love to hear them.
Interesting question. You are correct that those techniques won't work.
The reason it won't work: (which you already know but I'm elaborating for those following this thread)
When a database executes a query, there are three phases. First is to create an execution plan. Second is to execute the query using any indexes and tables identified in step one. Third is to return the matching rows to the user. Paging typically addresses the third step but for Brian's query, the bulk of the time is being spent in step two.
I'm going to assume that you've already tuned the query as much as possible. The next step is to bring down the 20 minutes. One way to do this is to break up the query. Some ideas:
1) Is a query to return just the matching keys faster? If so, you could do that query and then get individual rows. I suspect this isn't the case though, but it is possible for some rare database designs.
2) Is the query looking at data that can be partitioned in any way. If you partition (physically or even logically by having a partition type field in the index), you can query the partitions one at a time making each query fast. For example, suppose I have no physical partitions but a date column. I can then query the current year and my real criteria if I add the date column to my index.
2b) If you don't have physical partitions, consider adding some. For a query that takes 20 minutes to run, those tables must be huge. (or missing an index.) Physical partitions let you parallelize the work. With enough CPUs, this could bring down the 20 minutes by quite a bit.