Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Size of a Database transaction - Number of records

 
ap sharma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have to insert approximately 96,000 rows in a table based on certain calculations. The business requirements is to have this operation in a single transaction, i.e. either all the entries are made successful or none. I am using MySQL and I could see the operation takes much more time than expected.. I shut down the operation after 4 hours and by that time only 10000 rows have been inserted. I'mlooking for a better solution.

Can anyone throw some light on better ways of handling this?

Thanks in advance,
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
96,000 rows in one transaction is going to require a lot of work on the part of the database to track what it needs to undo. It is fairly unusual to use transactions with a batch of any notable size. You can probably beef up your database and the machine it's own to speed this up, but ultimately this is likely to be slow. How about handling the transaction programatically? You could insert these rows without using a single transaction then, if all are not successful, delete manually. You could track the partial state of the data with some sort of status attribute (i.e. have an "unverified" or whatever attribute and change other business logic to ignore unverified data).

 
Scott Selikoff
author
Saloon Keeper
Posts: 4014
18
Eclipse IDE Flex Google Web Toolkit
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Also, you should disable indexes during the insertion. Active index building will slow down that kind of volume loading.
 
ap sharma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
@Paul: What do you mean by "It is fairly unusual to use transactions with a batch of any notable size"?? Let me elaborate on 96K rows that I'm talking about. Each row is made up of 4 columns (2 Varchar and 2 Number), there is an Int PK in table for each row.
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
96,000 records really isn't that much. I don't think it should take anywhere close to 4 hours. I haven't done them in a transaction, but I have inserted about 1 million records on my local mysql database and it took seconds, not minutes, let alone hours. Just try to insert those same records via a mysql gui, and not through your code and see how long it takes.
 
Scott Selikoff
author
Saloon Keeper
Posts: 4014
18
Eclipse IDE Flex Google Web Toolkit
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1 million records in seconds? Really? As part of a single, rollback-enabled transaction?
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
steve souza wrote:96,000 records really isn't that much.


Yeah in the big scheme of things its not a massive amount of data, but it is not trivial if you do it in one transaction (though I agree, 4 hours seems like a very long time even for this size of transaction).
 
Matthew Brown
Bartender
Posts: 4567
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd be tempted to load the records into a temporary table (no transactions needed - just make sure the number of records is right at the end). Then you could copy them over to where you want them with a single INSERT INTO...SELECT statement. I'm not sure what the performance difference would be, but it wouldn't surprise me if it was a lot better.
 
Bill Karwin
author
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here are some tips for speeding up bulk loading in MySQL:

You should use LOAD DATA INFILE to bulk-load a large quantity of records quickly. This usually runs 10 or 20 times faster than using INSERT.
See http://dev.mysql.com/doc/refman/5.1/en/load-data.html

If you do use INSERTs, don't use autocommit. The overhead of starting and committing a transaction for each INSERT is really expensive. Learn to use explicit BEGIN TRANSACTION and COMMIT statements. I usually commit after inserting every 1000 rows, just in case resources associated with uncommitted data grow.

You can also reduce overhead by using multi-row INSERT syntax:

Disable foreign key checks and unique checks (remember to re-enable afterwards!):

Disable keys (i.e. indexes) during a restore. This is how you do it:

If you use MyISAM tables, make sure your key_buffer_size is set as large as possible. The default is 8MB, and the max is 4GB. I'd try 1GB.

If you use InnoDB tables, make sure your innodb_buffer_pool_size is set as large as possible . The default is 8MB, and the max is 4GB. I'd try 1GB.

Set innodb_flush_log_at_trx_commit = 2 during the data-load if you use InnoDB tables (but understand that this setting is global and may reduce slightly the ability of the database to guarantee writes).



 
Gian Franco
blacksmith
Ranch Hand
Posts: 979
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ap sharma wrote:I have to insert approximately 96,000 rows in a table based on certain calculations.


...do the calculations have any impact on the low performance?
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just tested loading about 20,000 records into a mysql table that has 3 columns of ints. I was surprised at how shockingly slow it was! I had previously done a 'load' of about a million records which was quite fast. The insert command however took minutes to load a couple thousand rows, before I gave up and killed it. This was whether I used a transaction or not. That seems awful slow for any database. In the past (on Sybase ASE) I have even seen large transactions perform better than individual insert statements because it doesn't have to send status back to the client after each insert completes. I am not a mysql expert, so I really don't know what to say. Just wanted to correct my previous statement.

For anyone that wants to try here is the table I used. I inserted the values 1,2,3 in each column.

 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic