Like Fred said, you need to get more deep into the problem, like getting some thread dumps and see which thread is using more CPU, how is your I/O, network and so on. After that you can do more effetive changes driven by the data you just collected, like maybe increase the batch size or change the transaction isolation and etc.
Naseem Afzal wrote:Important thing : Table does not have index.
In which case, insertion should be relatively quick, timewise - at least, close to linear with respect to rows added; unless there are foreign key dependencies that you haven't told us about - but I have to admit, I'm no expert on 'prepareStatement', or how it works with inserts.
The only thing I can suggest: Have you tested it with 500 similar inserts, with transactions every 100 rows, in the database itself?
Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
Luan Cestari wrote:Like Fred said, you need to get more deep into the problem, like getting some thread dumps and see which thread is using more CPU, how is your I/O, network and so on. After that you can do more effetive changes driven by the data you just collected, like maybe increase the batch size or change the transaction isolation and etc.
I believe, isolation levels have performance impact on select queries not on insert queries.
Can you please help me in understanding how isolation level impacts the insert query performance ?
Naseem : Oracle documentation mentions following thing for read-committed isolation -
Many high-performance environments with high transaction arrival rates require more throughput and faster response times than can be achieved with serializable isolation. Other environments that supports users with a very low transaction arrival rate also face very low risk of incorrect results due to phantoms and nonrepeatable reads. Read committed isolation is suitable for both of these environments.
Considering this, I think, this isolation level is worth trying to check for performance improvement.
Please let us know how your application responded after changing the batch size and isolation levels
This being Oracle, I'd suggest turning on sql trace with wait events to see where do the waits occur. It may be problematic in production environment, though. Also, make sure that you're using the latest Oracle JDBC driver, full support for JDBC batching was implemented only in the 11g version of the driver. Also trying other batch sizes (I'd even try smaller number) might be worth it.
In Oracle, isolation levels are not achieved by locking, and given that the original code contains INSERT INTO ... VALUES statement, not INSERT INTO ... SELECT one, I'd say that isolation level should not have any impact on its speed (writes are always done using the "current" view of the data, not the consistent view).
trying other batch sizes (I'd even try smaller number) might be worth it.
As for as I know performing operation in batch improves the performance minaly because of two reasons -
1- Connection creation is an expensive operation, application performance improves because of less number of times connection is created.
2- When we call execute batch, a set of operation are sent to database. As in most of the cases database use to be in seperate machine in network. The impact of network latency in complete opration gets subsided if operations are sent in batch rather than non-batch.
Considering these reasons, I believe, choosing a bigger batch size (not a smaller one) will improve the performance.
Can you please suggest, if you are referring to some specific scenario or some other reasoning where overall application performance would get improved by reducing the batch size ?
The first reason is not relevant. Connection pooling reduces the overhead of creating new connections, but in this case, we're working on a single connection already.
The second reason is relevant. We indeed want to avoid network latency. But there are other factors at play. If we hoard a big lump of data before sending them to the database, we might hit the network bandwidth instead. In an ideal case, we'd send the batch to the database every time it fills one packet of data (this would probably be optimal from the network perspective alone). I believe that in reality the optimal batch size will be one which will create several packets of data on every go, because sizing the batch so that it always precisely fills one network packet is not possible. However, some operations in the database (writing the data to redo logs is certainly relevant to this case) happen asynchronously; if you send too much data in one go, you might have to wait for these processes needlessly. You actually want to keep all parts - the application, the network, the database - busy all the time.
In any case, there's a law of diminishing returns clearly at play. Increasing batch size from 1 to 10 cuts down the time much more than increasing it from 10 to 20. And since I've experienced exceptions when setting the batch size too high, I'd rather settle for a smaller number that provides reasonable performance instead of trying to shave one more millisecond by increasing the batch size tenfold.
In this particular example, I don't think the problem lies in the batch size. Perhaps I shouldn't have mentioned it at all. At this point, using sqltrace is probably the easiest way to see what's really happening there.
Are you getting this performance issue in production or development environment too?
Are you noticing such performance all the times or some times, it much faster than you explained?
Do you have multiple database on the same machine in which your database resides ?
Do you see your database processor having almost 100% CPU utilization when such performance is noticed ?
I think, answers of these question might help you choosing the right direction to solve your problem.