This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Performance and the fly likes JDBC - Cursors vs client-side ResultSets Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "JDBC - Cursors vs client-side ResultSets" Watch "JDBC - Cursors vs client-side ResultSets" New topic
Author

JDBC - Cursors vs client-side ResultSets

Mike Fourier
Greenhorn

Joined: Apr 02, 2008
Posts: 25
I've looking at code that has about 9 million rows to process, and see no attempts at optimization (in the code). I say that, because I don't see any attempts at setting fetch size, or use cursors. Now: Perhaps that is because the defaults are already the most performant.

There seems to be two approaches:
1) retrieve the entire resultset to the client
2) use server-side cursors, and a fetch size

Is this statement true:

1) if the client has unlimited (ie: "enough") memory, does fetching "all at once" to the client outperform a solution where cursors are used, and multiple fetches are performed?

I would tend to think so, because you're hitting the db and network just once. Granted, the Resultset will be massive on the client, but assuming you have the memory and CPU to handle it....

I suppose when you really get *right* down to measuring seconds in an hours-long process, perhaps there's a performance benefit (or only a perceived one?) to doing fetches. That being: You can start to processing the resultset much faster (after the first fetch) rather than waiting for it all to traverse. But... it's not like the driver is doing a background fetch, right? So now it's a discussion between "wait a long while, then never again" vs "wait less time, but many times over".

My database is Sybase: from googling, I think this tends to matter.

For example, here's an Oracle post that makes it clear that anyone not setting their batchsize is asking for sucky performance:
http://blog.lishman.com/2008/03/jdbc-fetch-size.html

But... jTDS seems to indicate that for Sybase, it (by default) fetches everything anyways (note #4)...
http://jtds.sourceforge.net/resultSets.html

So.. I'm thinking that the Oracle speed-boost is really only about " *IF* you are using cursors, then set your batch size correctly" But if I'm not using that, then I'm using a faster/fastest resultset possible already...

Do I have that sort of correct?

edit: changed title from:
JDBC - Resultsets - What is the right 'size' for best performance?
[ May 26, 2008: Message edited by: Mike Fourier ]
Steve McLeod
Greenhorn

Joined: May 26, 2008
Posts: 11
The answer to all performance questions: it depends.

Write code that uses the various approaches. Run the code. Time the code. That will give you a better answer than any amount of theorizing.

Caveat: your result will be valid for your OS, JVM version, database vendor and version, and JDBC driver. Don't extrapolate your result to all other environments.


<a href="http://www.solidsimplesafe.com/" target="_blank" rel="nofollow">http://www.solidsimplesafe.com/</a>
Mike Fourier
Greenhorn

Joined: Apr 02, 2008
Posts: 25
Sorry, this will be a bit ranty.

I'm stuck between replying "most unhelpful reply ever" and "ask a stupid question..." (but mostly that second thing). So all of this, is with a certain amount of chagrin. Perhaps there's just no "smart" way to ask this question...

The title of my post was totally misleading. I wasn't actually ever asking for what the batch size was supposed to be. For that, I already know the answer ("it depends"). Anyone that asks "what should my batch size be?" does deserve "depends". But even then, the *generic* advice is (seemingly) "between 1/4 and 1/2 the size of the expected result size." So clearly (to some people) there seems to be some general advice one could give. And, of course, all the usual caveats apply to general advice (that being: it is *general* advice, and one's mileage will vary).

Another example of general advice that could be given, even though everyone will have different exact experiences: Given an expected resultset in the millions, then someone saying "In general, a batch size of 100 will be more performant than a batch size of 10" would not be incorrect. They would, in general, be giving good advice.

So:

Is there not some similar statement that one can make that answers the question: "do server side cursors generally underperform client-side resultsets?" (which is what I was, in my befuddled post, asking).

As for trying it out myself: Yes, I'll be doing that. And if I switch to a cursor and batchsize, I will be testing what is the 'best' size for my hardware / schema / network, etc, etc. But what I was looking for was an experienced person to say something like:

a) "know what? forget it. you're already using the best way"
or
b) "I've had mixed results, you'll really have to try both"
or
c) "batches are, in my experience, always faster for large resultsets"

Something to let me know if the time involved is even worth while. (How much time could it be??) Well... the current operation takes longer than my day at work. So testing several scenarios (batch sizes) would take several days. It also does a number of our test server, which other people share.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30136
    
150

Mike,
I choose "c". However, note that the default is not one big batch. It is database dependent. I think it is 20 or 100 rows at a time on one database.

You can test out what will help with way less than 9 million rows. Try to come up with an example of 10,000 rows and try some different scenarios. This is big enough to give you some basic data without taking all day to run.

Also, have you confirmed that everything else is optimized? For example, returning even one unused column results in a tremendous amount of network traffic.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: JDBC - Cursors vs client-side ResultSets
 
Similar Threads
best programming practises of JDBC
Huge result sets
Handling huge resultset
JDBC ResultSet
Understanding types of Result Sets