I have a general question about reading large amounts of data from a database using Java.
Basically, I am trying to convert a very large amount of data from one data base to another. The data formats are somewhat non-standard, so I have to write a program to do the data transformation.
In the past, I have follwed this basic alogrithm when converting data:
1. Read all of the data from the source into RAM (into some sort of data structure). 2. Validate/manipulate data in the data structure. 3. Read each element in the data structure and write it to the new data source.
Step 1 is no longer a feasible option. Unfortunately, I really don't know of any other way of doing this. Could someone please point me in the direction of a resource or give me a decent idea?
No, the transformation isn't dependent on having all of the data in memory. My program currently transforms one record at a time. It converts the ResultSet records into an object first to do some validation and manipulation.
My problem is that I need to "chunk" my data retrieval from the database. I can't possibly read all of the data at once and store it into a single ResultSet. However, I've never had to do something like this before and I'm looking for some best practices.
Has anyone seen anything like this on a website or in a book?
Tom, I had worked on something of a similar nature. We were evaluating some tools that could do the job for us and also write our custom code (java) to do the same.
We had adopted a mixture of approaches. We had to get data into a composite repository from different data sources (not necessarily RDBMS).
The following is the list of solutions that we had used 1. For smaller pieces of data from a file source, usage of JMS queues to convert the records into messages and have the DBRecordBuilder to convert these messages to SQL format and perform inserts (works well for small messages + its asynchronous) 2. Migration from exisiting database - We had written custom classes that read about 350 records in one go and work on the data (using a custom DataValidator class) and insert each record back into the target database/tables. This approach worked well, as we could perform action on each record and flag those records that could not be processed. But it took a while to process (this was ok with us as it was a one time port; we did this for about 1.5 million records -- we had this clocked under 2 hours)
3. This was an experimental strategy that we tried, but never got around doing it. The purpose was to create a flat file data for each of the table that we had wanted to insert into (using Stored procs for validation) and Perl to get the results to flat files and using SQLLoader (Oracle) to process the files and insert the information in the target database.-- we had some issues with SQLLoader which we could not get around fixing it in the short time...so we dropped this idea.
4. Using XML (I think u must have disucsses this solution) , but its painstaking and a waste of XML processing. use it onloy if you need to send the data to another source who does not have a defined format -- a definte overkill 5. If your taget and source database is Oracle, try using PRO*C for performance . If it is really necessary to use Java here, then try one of the 3 approaches... I am not sure if there is a pattern here for such large volume, am investigating it for another project that I am working on..will let you know if I come across something ... --Pradeep N Ram
You don't specify the backend database you are using. Sybase ASE lets you limit the number of rows you get with the set rowcount command (set rowcount 1000). If so you can always query anything greater than the greatest primary key value of the previous batch and get rows say 1000 at a time until you've processed them all.