I have to insert approximately 96,000 rows in a table based on certain calculations. The business requirements is to have this operation in a single transaction, i.e. either all the entries are made successful or none. I am using MySQL and I could see the operation takes much more time than expected.. I shut down the operation after 4 hours and by that time only 10000 rows have been inserted. I'mlooking for a better solution.
Can anyone throw some light on better ways of handling this?
96,000 rows in one transaction is going to require a lot of work on the part of the database to track what it needs to undo. It is fairly unusual to use transactions with a batch of any notable size. You can probably beef up your database and the machine it's own to speed this up, but ultimately this is likely to be slow. How about handling the transaction programatically? You could insert these rows without using a single transaction then, if all are not successful, delete manually. You could track the partial state of the data with some sort of status attribute (i.e. have an "unverified" or whatever attribute and change other business logic to ignore unverified data).
@Paul: What do you mean by "It is fairly unusual to use transactions with a batch of any notable size"?? Let me elaborate on 96K rows that I'm talking about. Each row is made up of 4 columns (2 Varchar and 2 Number), there is an Int PK in table for each row.
96,000 records really isn't that much. I don't think it should take anywhere close to 4 hours. I haven't done them in a transaction, but I have inserted about 1 million records on my local mysql database and it took seconds, not minutes, let alone hours. Just try to insert those same records via a mysql gui, and not through your code and see how long it takes.
steve souza wrote:96,000 records really isn't that much.
Yeah in the big scheme of things its not a massive amount of data, but it is not trivial if you do it in one transaction (though I agree, 4 hours seems like a very long time even for this size of transaction).
I'd be tempted to load the records into a temporary table (no transactions needed - just make sure the number of records is right at the end). Then you could copy them over to where you want them with a single INSERT INTO...SELECT statement. I'm not sure what the performance difference would be, but it wouldn't surprise me if it was a lot better.
If you do use INSERTs, don't use autocommit. The overhead of starting and committing a transaction for each INSERT is really expensive. Learn to use explicit BEGIN TRANSACTION and COMMIT statements. I usually commit after inserting every 1000 rows, just in case resources associated with uncommitted data grow.
You can also reduce overhead by using multi-row INSERT syntax:
Disable foreign key checks and unique checks (remember to re-enable afterwards!):
Disable keys (i.e. indexes) during a restore. This is how you do it:
If you use MyISAM tables, make sure your key_buffer_size is set as large as possible. The default is 8MB, and the max is 4GB. I'd try 1GB.
If you use InnoDB tables, make sure your innodb_buffer_pool_size is set as large as possible . The default is 8MB, and the max is 4GB. I'd try 1GB.
Set innodb_flush_log_at_trx_commit = 2 during the data-load if you use InnoDB tables (but understand that this setting is global and may reduce slightly the ability of the database to guarantee writes).
ap sharma wrote:I have to insert approximately 96,000 rows in a table based on certain calculations.
...do the calculations have any impact on the low performance?
"Eppur si muove!"
Joined: Jun 26, 2002
I just tested loading about 20,000 records into a mysql table that has 3 columns of ints. I was surprised at how shockingly slow it was! I had previously done a 'load' of about a million records which was quite fast. The insert command however took minutes to load a couple thousand rows, before I gave up and killed it. This was whether I used a transaction or not. That seems awful slow for any database. In the past (on Sybase ASE) I have even seen large transactions perform better than individual insert statements because it doesn't have to send status back to the client after each insert completes. I am not a mysql expert, so I really don't know what to say. Just wanted to correct my previous statement.
For anyone that wants to try here is the table I used. I inserted the values 1,2,3 in each column.
subject: Size of a Database transaction - Number of records