After *much* I've found a long standing (and annoying bug) in one of my projects.
Basically, we bulk insert millions of records into a DB each day through various processes. They all call a common DB layer, which uses addBatch to make it as efficient as possible.
So far, so good. Except for one of the processes, addBatch fails with weird DB2 driver errors, like :
Writing them in batches of 1 cures this (same code, just one addBatch call per executeBatch call rather than the usual 100 or so addBatch calls per executeBatch). For ages I've had no idea why, until we added another process which was getting a similar error, which was also fixed by writing batches of 1. Turns out, it's null handling that breaks it. It seems if you have, say, an insert with 4 parameters like this :
and for row 1, you have values for each column, but row 2 has a null for column b, it will fail the batch. Run each row one at a time though, and it's all good.
As the column specs change for each process, I don't know the type of column I'm adding to the batch beforehand, so I do this for each parameter :
Now I'm assuming that if I set the correct null type with setNull, this will work for batches of multiple items, however my problem is that I don't see how I can know the correct type.
What I have seen done in the past (I assume to get around the whole setNull thing) is to replace the ?'s in the original SQL with the literal null, *then* use that as the basis for the prepared statement and set the rest of the parameters appropriately. I don't see how I can use that in this case though, as the relies on the preparedStatement applying to all records, and each one may have nulls in different places.
Does anyone have any idea how to fix this? Is it specific to the DB2 driver? Has anyone seen it before?
Running it in batches of 1 kills the performance as it's having to do a commit each time, so I could really do with fixing this!
I believe, using *MetaData interfaces of the JDBC will allow you to inspect the type of a column.
Not many get the right opportunity !
Joined: Jul 07, 2011
Thanks, but I'm not sure how that will help?
The parameters I'm inserting come from an arrayList of objects - if one of those objects is null, it's null, there's no column information there.
Or do you mean I should inspect the destination table before I do each batch and determine the structure then? That could possibly work, but would mean I'd be adding additional runtime to the situations where it's not needed.
I believe, you have a separate component for each type of batch update. If so, you may have a configurable structure of the table associated with each component. Load it once for each batch update and then do the needful. I guess, you can afford a one time load of table structure for each batch run.