My problem: I have a table with approx 5 million rows. I need to grab the data in a text column in this table, tokenize it, write it back out to the table, and write the first word of that data out to another column in the same table.
My Solution: This only has to be run once so I'm not as concerned with elegance and speed as long as it works. My plan was to retrieve data into a resultset, process it, write the data to the db, then get more data into the resultset and keep going until I've processed the whole table. I have to use a 3rd party tokenize that is written in java so performing this on the server with SPs and SQL isn't an option.
I get the same error on the same row number every time which leads me to believe it is a memory problem but I'm not sure what it is or how to fix it. I can run this code on a table where there are only 500 rows and it executes fine. This is the error I get:
MESSAGE: Can not read response from server. Expected to read 65 bytes, read 47 bytes before connection was unexpectedly lost.
I use two Connection objects, one for reading and one for writing. My understanding is that I have to completely read all of the rows before I can use the connection object for any other resultset since I'm getting a row at a time.
This is how I get the resultset. It is within a try/catch block:
This is how I process it. :
Do I need to instantiate a new PreparedStatement every time I write to the table? I'm using the same statement and simply setting the values in it every time.
I'm the only person using the DB so it isn't that someone else is trying to access the same data I'm accessing. The Java code is executing from my desktop. I do not have access to an application server.
I'm hoping someone can see my error or has an better solution to my problem.
Take a look at preparedStatement.executeBatch(). It is designed to do bulk updates better. You also want to commit periodically. When I did the data migration for JavaRanch, I committed every 1000-10000 rows depending on the size of the row. (posts are bigger than usernames.) This gave me fallback points if things went wrong and prevented connection timeouts.
I'm a little confused. According to the documentation, a Connection object is in AutoCommit mode by default. So every time I issue a executeUpdate(), it should be doing a commit on its own. Is this not correct? If so, did you turn off AutoCommit and then issue your own commit every 1000 - 10000 rows?
R Jarman wrote:According to the documentation, a Connection object is in AutoCommit mode by default. So every time I issue a executeUpdate(), it should be doing a commit on its own. Is this not correct?
It can be. Sometimes a transaction is done at the end if the bulk insert is done as part of another transaction. That doesn't sound like it is the case for you in which case you can reply on autocommit for the batch.