File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JDBC and the fly likes Batching Select Statements in JDBC Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Batching Select Statements in JDBC" Watch "Batching Select Statements in JDBC" New topic
Author

Batching Select Statements in JDBC

Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24187
    
  34

The October issue of the JavaRanch Journal has just been posted, and our own Jeanne Boyarsky's written a a rootin' tootin' new article! If'n you've got somethin' to say about it, this here's the place!


[Jess in Action][AskingGoodQuestions]
Andreas Ecker
Greenhorn

Joined: Jun 13, 2002
Posts: 4
Hi,

just curiousity: why is it good to use batch sizes, that are relativley prime?
Why will this give you less server roundtrips?

Andreas
Amy Smith
Greenhorn

Joined: Jul 26, 2001
Posts: 24
Great algorithm, thanks.

If you have the luxury of knowing values that are out of range for the column in the where, use the out of range values for the extra parameter values. This would let you handle the stragglers in one query.

IE. selecting by an id column that is always a positive number.
select ... from ... where ... in (?,?,?,?,?) and you are looking for "1,2,3". Use -1 for the other 2 parameters.

--Amy


Amy Smith<br />Java Developer<br />Haworth, Inc.<br />amy.smith@haworth.com
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Originally posted by Andreas Ecker:
Hi,

just curiousity: why is it good to use batch sizes, that are relativley prime?
Why will this give you less server roundtrips?

Andreas

The short answer is mathematically it works out that way. Consider two extreme examples.

1) batch sizes of 1, 4 and 8:
If user wants 8 values, we use 8. - # queries: 1
If user wants 9 values, we use 8/1. - # queries: 2
If user wants 10 values, we use 8/1/1. - # queries: 3
If user wants 11 values, we use 8/1/1/1. - # queries: 4
If user wants 12 values, we use 8/4. - # queries: 2
If user wants 13 values, we use 8/4/1. - # queries: 3

2) batch sizes of 1, 4 and 9:
If user wants 8 values, we use 4/4. - # queries: 2
If user wants 9 values, we use 9. - # queries: 1
If user wants 10 values, we use 9/1. - # queries: 2
If user wants 11 values, we use 9/1/1. - # queries: 3
If user wants 12 values, we use 9/1/1/1. - # queries: 4
If user wants 13 values, we use 9/4. - # queries: 2

If you try this out with a lot of different numbers, it works out that there are more values that result in less queries with relatively prime numbers. This is mainly due to the effect shown for 8 values. We can use 4/4 instead of 8 - which gets us to higher values with less queries.

I'm sure this can be more mathematically explained. I used a combination of math and experimentation to originally figure it out.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Originally posted by Amy Smith:
Great algorithm, thanks.

If you have the luxury of knowing values that are out of range for the column in the where, use the out of range values for the extra parameter values. This would let you handle the stragglers in one query.

IE. selecting by an id column that is always a positive number.
select ... from ... where ... in (?,?,?,?,?) and you are looking for "1,2,3". Use -1 for the other 2 parameters.

--Amy
]
This is a very good and interesting point. You wouldn't actually need to know what values are out of range. You could just fill it in with the first value:
select ... from ... where ... in (1,2,3,1,1)

Just like everything else in the article, you would have to verify that it was indeed faster to do it this way. I suspect it would be though unless the query involved a lot of processing (subqueries and the like.) The column involved in the in clause is highly likely to have an index. So it would be comparing an index scan and disk accesses to the saved network roundtrips. The database is smart enough to only return the rows once. Of course if you did know of values that are out of range, it would be faster because you would just have the index scan.
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8919

stmt.setInt(1);


Is the above a typo ? The method signature should have a parameter index followed by the integer value.


Groovy
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8919



I haven't understood this. In the above example there are 3 statements and setxxx methods invoked 3 times. If there 100 id are you suggesting that the "select" statement appear 100 times. May me I am missing something.
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Originally posted by Pradip Bhat:
Is the above a typo ? The method signature should have a parameter index followed by the integer value.

Oops. Yes that is a typo. And I seem to have done that in a few places. That's what I get for coding without a compiler around...
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Pradip,
Yes, if there 100 ids, the select statement would appear 100 times. This could also be written as

The idea is that you have the single statement "select id, name from users where id = ?;" that gets sent to the database multiple times in the same server roundtrip. It's only marginally better than doing the 100 statements in separate trips though. And logically more confusing!
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8919

Thanks Jeanne. Why don't you write an article onn JDBCRowSet et al
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Pradip,
Thomas Paul actually wrote an article about the RowSet. It's an excellent read!
Frank Brown
Greenhorn

Joined: Jan 22, 2004
Posts: 6
Nitpicking really, but... I think you meant:

for (int i=0; i < batchSize; i++) {
if ( firstValue ) {
firstValue = false;
} else {
inClause.append(',');
}
inClause.append('?');
}
... not this:
for (int i=0; i < batchSize; i++) {
inClause.append('?');
if ( firstValue ) {
firstValue = false;
} else {
inClause.append(',');
}
}
( so you get "?,?,?,...?,?" not "??,?,...?,?,")

... but, since you know batchSize, why not do this:
for (int i=0; i < batchSize - 1; i++) { inClause.append('?,') }
inClause.append('?');

But as Utah Phillips would say, "... it's good though!"


"Make things as simple as possible, but no simpler." - A.Einstein
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Frank,
Yes. Good catch and thanks for the code review!
Reid M. Pinchback
Ranch Hand

Joined: Jan 25, 2002
Posts: 775
Interesting technique and worth having in the toolkit. Goes beyond just "here is a neat idea" and has a sound implementation strategy that should work in production. Databases like Oracle cache the execution plans for the statements they've recently processed. I like that this technique sticks with bound parameters instead of literal values, and has a way of minimizing how many different types of statements it creates. The end result is that it looks less likely to cause Oracle to flush out all its good execution plans for a temporary glut of dynamic queries.

There is another technique that would be even faster. Definitely works on Oracle, would require a bit of adaptation on other database. If you create a temporary table, you can do a batch update of all your values (ids in this example) to the temporary table. Then all you need to do is a simple join in your query between the table of interest and the temporary table. End result; one batch of data goes in, one batch of data goes out.


Reid - SCJP2 (April 2002)
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Reid,
The technique was actually developed on db2 and migrated to Oracle. We knew that we would be migrating so cross-database compatibility was essential. And we are using it in production
David O'Meara
Rancher

Joined: Mar 06, 2001
Posts: 13459

I know it's been a while, but I'm late to the party.

I must admit that I didn't fully understand this article when I first read it, but now I'm writing a routine to migrate table data between databases (it's a long story) and this came to mind.

After a re-read it's just what I want
Dan Geck
Greenhorn

Joined: May 21, 2008
Posts: 2
hi guys i found a simple way to do it I dont know if this is helpful enough.

first create string of SQL
privae String SQL = "SELECT * FROM table WHERE column_name IN ($)";

second create a method that will modify the string SQL and convert the "dollar sign" into strings or ids

private String fillINStatement(String sql, int[] ids){

String IN = "";
String comma = ",";
String newSql;
int i;

for(i=0; i < ids.length; i++){

if((ids.length - 1) == i){
comma = "";
}
IN+=ids[i]+comma;
}
// this will replace $ into string of ids
newSql = sql.replace("$", IN);

return newSql;
}

third call the method and pass the values

int[] ids = {1,2,3,4};
String myNewSql = this.fillINStatement(this.SQL, ids);

so the result of your myNewSql will be something like this.
SELECT * FROM table WHERE column_name IN (1,2,3,4);

i dont know if this make sense. but it works fine for me hope in some ways this script could help.
David O'Meara
Rancher

Joined: Mar 06, 2001
Posts: 13459

Nearly, but not quite.
You still need to use a PreparedStatement, which means you populate '?'s and then bind your values to them. Each time you call it with the same number of parameters, the database is able to reuse the compiled statement, saving work and time on each run.

Thanks for the code though, I plan on using it
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Dan,
That works if you don't mind using up up prepared statement cache entries on the entry. If you only have 1 - 4 values (or a like small number), this isn't a problem. In my applications, I have an arbitrary number between 1 and (potentially) a few thousand. This means the other queries would wind up being booted from the cache. It also means I pass the maximum number of bound variables for some databases.
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Upon re-reading your post, I realize the $ is converted into the string literals. Meaning it doesn't use a prepared statement binding variables at all. This opens up the possibility of SQL injection.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39791
    
  28
And welcome to JavaRanch, Dan
Dan Geck
Greenhorn

Joined: May 21, 2008
Posts: 2
Thanks for the warm welcome guys. Well I forgot to include that I'm using jdbcTemplate.query so yeah it's not applicable for PreparedStatement. I'm still a novice in Java though and still exploring things. Thanks for the comment guys..
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Originally posted by Dan Geck:
so yeah it's not applicable for PreparedStatement.

Ah! That was the premise of my article. Which explains the difference!
Bruce You
Greenhorn

Joined: Jun 11, 2009
Posts: 1
This is my Groovy implementation about this:

It's so good with Groovy!
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30917
    
158

Bruce: Very cool!
Leonardo Carreira
Ranch Hand

Joined: Apr 07, 2009
Posts: 489

Hi Friends..

iam just curious.. sorry if i was Out Of Topic...

How to implement Batching Select Statement in ORM like Hibernate?..

Thanks in advance..



Sorry, perhaps my english language isn't too good.. Prepare for SCJP 6, Please God help me.. ☼
References : [Java.Boot] [JavaChamp] [JavaPrepare]
Kunal Lakhani
Ranch Hand

Joined: Jun 05, 2010
Posts: 622
Using criteria and criterion. In Restrctions. class there is a in method which takes 2 args, string property and the collection.




kunal
Pierre Drapeau
Greenhorn

Joined: Oct 15, 2012
Posts: 1
Another alternative that I found applicable in my case is to use a batch size containing at least the number of values that I want to pass, up to the maximum large batch size, and filling unset values (the remainder) with dummy values. This way, most of the time only one query runs, unless I exceed the larger batch size. Here I look for at least one returned value. If not, I would simply have to pile up the returned values query.neMaintPartition.

for instance:




And then, filling the query:






Douglas Lancy
Greenhorn

Joined: Jun 24, 2014
Posts: 1
From both a performance perspective and for simplification, wouldn't using a temporary table make more sense? Create the temporary table with the values for the IN clause and then use that as part of your Prepared Statement? Are there any issues or major performance concerns with this approach?

Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3611
    
  60

Using temporary tables is certainly an option. Performancewise, I'd say it would greatly depend on the target database. Databases that use optimizer might need some prodding to generate optimal plans for these queries. I've chosen this approach in a project where the number of items in the IN clause varied greatly, with no reasonable upper bound on it.

... and welcome to the Ranch!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Batching Select Statements in JDBC