File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes JDBC and Relational Databases and the fly likes Data Loading Strategy Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "Data Loading Strategy" Watch "Data Loading Strategy" New topic

Data Loading Strategy

Tom Purl
Ranch Hand

Joined: May 24, 2002
Posts: 104
I have a general question about reading large amounts of data from a database using Java.

Basically, I am trying to convert a very large amount of data from one data base to another. The data formats are somewhat non-standard, so I have to write a program to do the data transformation.

In the past, I have follwed this basic alogrithm when converting data:

1. Read all of the data from the source into RAM (into some sort of data structure).
2. Validate/manipulate data in the data structure.
3. Read each element in the data structure and write it to the new data source.

Step 1 is no longer a feasible option. Unfortunately, I really don't know of any other way of doing this. Could someone please point me in the direction of a resource or give me a decent idea?

Thanks a ton!

Tom Purl

Tom Purl<br />SCJP 1.4
Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 63867

Is the data transformation dependent upon having all the data in memory? If not, why not just transform a record at a time?

[Asking smart questions] [About Bear] [Books by Bear]
Tom Purl
Ranch Hand

Joined: May 24, 2002
Posts: 104
No, the transformation isn't dependent on having all of the data in memory. My program currently transforms one record at a time. It converts the ResultSet records into an object first to do some validation and manipulation.

My problem is that I need to "chunk" my data retrieval from the database. I can't possibly read all of the data at once and store it into a single ResultSet. However, I've never had to do something like this before and I'm looking for some best practices.

Has anyone seen anything like this on a website or in a book?
Pradeep Ram

Joined: Sep 29, 2004
Posts: 18
I had worked on something of a similar nature. We were evaluating some tools that could do the job for us and also write our custom code (java) to do the same.

We had adopted a mixture of approaches. We had to get data into a composite repository from different data sources (not necessarily RDBMS).

The following is the list of solutions that we had used
1. For smaller pieces of data from a file source, usage of JMS queues to convert the records into messages and have the DBRecordBuilder to convert these messages to SQL format and perform inserts (works well for small messages + its asynchronous)
2. Migration from exisiting database - We had written custom classes that read about 350 records in one go and work on the data (using a custom DataValidator class) and insert each record back into the target database/tables. This approach worked well, as we could perform action on each record and flag those records that could not be processed. But it took a while to process (this was ok with us as it was a one time port; we did this for about 1.5 million records -- we had this clocked under 2 hours)

3. This was an experimental strategy that we tried, but never got around doing it. The purpose was to create a flat file data for each of the table that we had wanted to insert into (using Stored procs for validation) and Perl to get the results to flat files and using SQLLoader (Oracle) to process the files and insert the information in the target database.-- we had some issues with SQLLoader which we could not get around fixing it in the short we dropped this idea.

4. Using XML (I think u must have disucsses this solution) , but its painstaking and a waste of XML processing. use it onloy if you need to send the data to another source who does not have a defined format -- a definte overkill
5. If your taget and source database is Oracle, try using PRO*C for performance . If it is really necessary to use Java here, then try one of the 3 approaches... I am not sure if there is a pattern here for such large volume, am investigating it for another project that I am working on..will let you know if I come across something ...
--Pradeep N Ram
Tom Purl
Ranch Hand

Joined: May 24, 2002
Posts: 104
Thanks a ton for your help, Pradeep!
steve souza
Ranch Hand

Joined: Jun 26, 2002
Posts: 862
You don't specify the backend database you are using. Sybase ASE lets you limit the number of rows you get with the set rowcount command (set rowcount 1000). If so you can always query anything greater than the greatest primary key value of the previous batch and get rows say 1000 at a time until you've processed them all. - a fast, free open source performance tuning api.
JavaRanch Performance FAQ
I agree. Here's the link:
subject: Data Loading Strategy
It's not a secret anymore!