Win a copy of Succeeding with AI this week in the Artificial Intelligence and Machine Learning forum!
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
• Campbell Ritchie
• Paul Clapham
• Ron McLeod
• Liutauras Vilda
• Junilu Lacar
Sheriffs:
• Tim Cooke
• Jeanne Boyarsky
• Knute Snortum
Saloon Keepers:
• Stephan van Hulst
• Tim Moores
• Tim Holloway
• Carey Brown
• Piet Souris
Bartenders:
• salvin francis
• fred rosenberger
• Frits Walraven

# Batching Select Statements in JDBC

author and iconoclast
Posts: 24203
43
The October issue of the JavaRanch Journal has just been posted, and our own Jeanne Boyarsky's written a a rootin' tootin' new article! If'n you've got somethin' to say about it, this here's the place!

Greenhorn
Posts: 4
Hi,

just curiousity: why is it good to use batch sizes, that are relativley prime?
Why will this give you less server roundtrips?

Andreas

Greenhorn
Posts: 24
Great algorithm, thanks.

If you have the luxury of knowing values that are out of range for the column in the where, use the out of range values for the extra parameter values. This would let you handle the stragglers in one query.

IE. selecting by an id column that is always a positive number.
select ... from ... where ... in (?,?,?,?,?) and you are looking for "1,2,3". Use -1 for the other 2 parameters.

--Amy

author & internet detective
Posts: 39959
804

Originally posted by Andreas Ecker:
Hi,

just curiousity: why is it good to use batch sizes, that are relativley prime?
Why will this give you less server roundtrips?

Andreas

The short answer is mathematically it works out that way. Consider two extreme examples.

1) batch sizes of 1, 4 and 8:
If user wants 8 values, we use 8. - # queries: 1
If user wants 9 values, we use 8/1. - # queries: 2
If user wants 10 values, we use 8/1/1. - # queries: 3
If user wants 11 values, we use 8/1/1/1. - # queries: 4
If user wants 12 values, we use 8/4. - # queries: 2
If user wants 13 values, we use 8/4/1. - # queries: 3

2) batch sizes of 1, 4 and 9:
If user wants 8 values, we use 4/4. - # queries: 2
If user wants 9 values, we use 9. - # queries: 1
If user wants 10 values, we use 9/1. - # queries: 2
If user wants 11 values, we use 9/1/1. - # queries: 3
If user wants 12 values, we use 9/1/1/1. - # queries: 4
If user wants 13 values, we use 9/4. - # queries: 2

If you try this out with a lot of different numbers, it works out that there are more values that result in less queries with relatively prime numbers. This is mainly due to the effect shown for 8 values. We can use 4/4 instead of 8 - which gets us to higher values with less queries.

I'm sure this can be more mathematically explained. I used a combination of math and experimentation to originally figure it out.

Jeanne Boyarsky
author & internet detective
Posts: 39959
804

Originally posted by Amy Smith:
Great algorithm, thanks.

If you have the luxury of knowing values that are out of range for the column in the where, use the out of range values for the extra parameter values. This would let you handle the stragglers in one query.

IE. selecting by an id column that is always a positive number.
select ... from ... where ... in (?,?,?,?,?) and you are looking for "1,2,3". Use -1 for the other 2 parameters.

--Amy

]
This is a very good and interesting point. You wouldn't actually need to know what values are out of range. You could just fill it in with the first value:
select ... from ... where ... in (1,2,3,1,1)

Just like everything else in the article, you would have to verify that it was indeed faster to do it this way. I suspect it would be though unless the query involved a lot of processing (subqueries and the like.) The column involved in the in clause is highly likely to have an index. So it would be comparing an index scan and disk accesses to the saved network roundtrips. The database is smart enough to only return the rows once. Of course if you did know of values that are out of range, it would be faster because you would just have the index scan.

Ranch Hand
Posts: 8943

stmt.setInt(1);

Is the above a typo ? The method signature should have a parameter index followed by the integer value.

Ranch Hand
Posts: 8943

I haven't understood this. In the above example there are 3 statements and setxxx methods invoked 3 times. If there 100 id are you suggesting that the "select" statement appear 100 times. May me I am missing something.

Jeanne Boyarsky
author & internet detective
Posts: 39959
804

Is the above a typo ? The method signature should have a parameter index followed by the integer value.

Oops. Yes that is a typo. And I seem to have done that in a few places. That's what I get for coding without a compiler around...

Jeanne Boyarsky
author & internet detective
Posts: 39959
804
Yes, if there 100 ids, the select statement would appear 100 times. This could also be written as

The idea is that you have the single statement "select id, name from users where id = ?;" that gets sent to the database multiple times in the same server roundtrip. It's only marginally better than doing the 100 statements in separate trips though. And logically more confusing!

Ranch Hand
Posts: 8943
Thanks Jeanne. Why don't you write an article onn JDBCRowSet et al

Jeanne Boyarsky
author & internet detective
Posts: 39959
804
Thomas Paul actually wrote an article about the RowSet. It's an excellent read!

Greenhorn
Posts: 6
Nitpicking really, but... I think you meant:

for (int i=0; i < batchSize; i++) {
if ( firstValue ) {
firstValue = false;
} else {
inClause.append(',');
}
inClause.append('?');
}
... not this:
for (int i=0; i < batchSize; i++) {
inClause.append('?');
if ( firstValue ) {
firstValue = false;
} else {
inClause.append(',');
}
}
( so you get "?,?,?,...?,?" not "??,?,...?,?,")

... but, since you know batchSize, why not do this:
for (int i=0; i < batchSize - 1; i++) { inClause.append('?,') }
inClause.append('?');

But as Utah Phillips would say, "... it's good though!"

Jeanne Boyarsky
author & internet detective
Posts: 39959
804
Frank,
Yes. Good catch and thanks for the code review!

Ranch Hand
Posts: 775
Interesting technique and worth having in the toolkit. Goes beyond just "here is a neat idea" and has a sound implementation strategy that should work in production. Databases like Oracle cache the execution plans for the statements they've recently processed. I like that this technique sticks with bound parameters instead of literal values, and has a way of minimizing how many different types of statements it creates. The end result is that it looks less likely to cause Oracle to flush out all its good execution plans for a temporary glut of dynamic queries.

There is another technique that would be even faster. Definitely works on Oracle, would require a bit of adaptation on other database. If you create a temporary table, you can do a batch update of all your values (ids in this example) to the temporary table. Then all you need to do is a simple join in your query between the table of interest and the temporary table. End result; one batch of data goes in, one batch of data goes out.

Jeanne Boyarsky
author & internet detective
Posts: 39959
804
Reid,
The technique was actually developed on db2 and migrated to Oracle. We knew that we would be migrating so cross-database compatibility was essential. And we are using it in production

Rancher
Posts: 13459
I know it's been a while, but I'm late to the party.

I must admit that I didn't fully understand this article when I first read it, but now I'm writing a routine to migrate table data between databases (it's a long story) and this came to mind.

After a re-read it's just what I want

Greenhorn
Posts: 2
hi guys i found a simple way to do it I dont know if this is helpful enough.

first create string of SQL
privae String SQL = "SELECT * FROM table WHERE column_name IN (\$)";

second create a method that will modify the string SQL and convert the "dollar sign" into strings or ids

private String fillINStatement(String sql, int[] ids){

String IN = "";
String comma = ",";
String newSql;
int i;

for(i=0; i < ids.length; i++){

if((ids.length - 1) == i){
comma = "";
}
IN+=ids[i]+comma;
}
// this will replace \$ into string of ids
newSql = sql.replace("\$", IN);

return newSql;
}

third call the method and pass the values

int[] ids = {1,2,3,4};
String myNewSql = this.fillINStatement(this.SQL, ids);

so the result of your myNewSql will be something like this.
SELECT * FROM table WHERE column_name IN (1,2,3,4);

i dont know if this make sense. but it works fine for me hope in some ways this script could help.

David O'Meara
Rancher
Posts: 13459
Nearly, but not quite.
You still need to use a PreparedStatement, which means you populate '?'s and then bind your values to them. Each time you call it with the same number of parameters, the database is able to reuse the compiled statement, saving work and time on each run.

Thanks for the code though, I plan on using it

Jeanne Boyarsky
author & internet detective
Posts: 39959
804
Dan,
That works if you don't mind using up up prepared statement cache entries on the entry. If you only have 1 - 4 values (or a like small number), this isn't a problem. In my applications, I have an arbitrary number between 1 and (potentially) a few thousand. This means the other queries would wind up being booted from the cache. It also means I pass the maximum number of bound variables for some databases.

Jeanne Boyarsky
author & internet detective
Posts: 39959
804
Upon re-reading your post, I realize the \$ is converted into the string literals. Meaning it doesn't use a prepared statement binding variables at all. This opens up the possibility of SQL injection.

Marshal
Posts: 68966
275
And welcome to JavaRanch, Dan

Dan Geck
Greenhorn
Posts: 2
Thanks for the warm welcome guys. Well I forgot to include that I'm using jdbcTemplate.query so yeah it's not applicable for PreparedStatement. I'm still a novice in Java though and still exploring things. Thanks for the comment guys..

Jeanne Boyarsky
author & internet detective
Posts: 39959
804

Originally posted by Dan Geck:
so yeah it's not applicable for PreparedStatement.

Ah! That was the premise of my article. Which explains the difference!

Greenhorn
Posts: 1

It's so good with Groovy!

Jeanne Boyarsky
author & internet detective
Posts: 39959
804

Ranch Hand
Posts: 494
Hi Friends..

iam just curious.. sorry if i was Out Of Topic...

How to implement Batching Select Statement in ORM like Hibernate?..

Ranch Hand
Posts: 622
Using criteria and criterion. In Restrctions. class there is a in method which takes 2 args, string property and the collection.

Greenhorn
Posts: 1
Another alternative that I found applicable in my case is to use a batch size containing at least the number of values that I want to pass, up to the maximum large batch size, and filling unset values (the remainder) with dummy values. This way, most of the time only one query runs, unless I exceed the larger batch size. Here I look for at least one returned value. If not, I would simply have to pile up the returned values query.neMaintPartition.

for instance:

And then, filling the query:

Greenhorn
Posts: 1
From both a performance perspective and for simplification, wouldn't using a temporary table make more sense? Create the temporary table with the values for the IN clause and then use that as part of your Prepared Statement? Are there any issues or major performance concerns with this approach?

Sheriff
Posts: 3837
66
Using temporary tables is certainly an option. Performancewise, I'd say it would greatly depend on the target database. Databases that use optimizer might need some prodding to generate optimal plans for these queries. I've chosen this approach in a project where the number of items in the IN clause varied greatly, with no reasonable upper bound on it.

... and welcome to the Ranch!

 There were millions of the little blood suckers. But thanks to this tiny ad, I wasn't bitten once. Two software engineers solve most of the world's problems in one K&R sized book https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton