File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes A Java program or an ETL? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "A Java program or an ETL?" Watch "A Java program or an ETL?" New topic

A Java program or an ETL?

Mithun Kulkarni

Joined: Jul 06, 2009
Posts: 9

In my current assignment we are planning to migrate data from DB2 to Oracle. Currently the data is stored in the form of a XML blob in DB2. However we have designed a relation data model in Oracle to store the same data (which was stored as a XML blob in DB2).

We are also introducing JPA layer for persistence to store and retrieve data to/from oracle.

I have around 50 million records in DB2 in the form of XML blob. Now I want to move this data to Oracle (in a relation db model- each entity stored in a seperate table).

I want to know which of the below approaches I should follow to do this data migration:

1. Use an ETL tool. My company has Ab Initio license, so I can use this ETL tool.[I do not have experience, but will be eager to learn].

2. Build a new tool using Java for this migration. As I had done a POC on using JPA for persisting and retrieving entities, and the JPA mappings is already there, I can reuse some of this code. And some additional processing which needs to be done in approach 1 can be avoided. [Additional processing : mapping entity to database, processing XML blob and reading its content]

So please help me in making this decision with supporting points.

Thanks in advance,

Jeanne Boyarsky
author & internet detective

Joined: May 26, 2003
Posts: 33117

I would use the ETL tool. It's a lot of data to move and an ETL tool is going to be better optimized than something you write.

[OCA 8 book] [Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Other Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, TOGAF part 1 and part 2
Mithun Kulkarni

Joined: Jul 06, 2009
Posts: 9
Jeanne Boyarsky wrote:Mithun,
I would use the ETL tool. It's a lot of data to move and an ETL tool is going to be better optimized than something you write.

Thanks for resply Jeanne,
But what I think is an ETL does simple things in a faster way. In my case, I need to read a row from DB2, then load the XML blob, then parse the XML to convert it into a simple XML, then map the xml elements to table columns. So when I have these steps in my migration, then I may need to build custome components for ETL.
So don't you think building a java tool will be better in this case?
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 15084

I don't know that ETL tool, so I can't give you a clear answer.

You'd have to look at the features of the ETL tool and see if it can do what you need. If it doesn't, then maybe it's better to write a Java program.

An advantage of the ETL tool is that it will probably run faster than a custom program. On the other hand, you'll have to learn how to use the tool, which will also take you time. You'll have to make an estimate of how long it will take you to learn the tool and configure it to do the conversion you need, versus how long it will take you to write a Java program to do the same.

Is the conversion of those 50 million records something that you're only going to ever run once, or is it something you'll have to do many times? If it's only once, then it might not be such a problem if your Java program isn't superfast. You'd just let it run for one night and the next morning it will be done.

It sounds like an interesting project, and whatever solution you choose, it's very important that you really thorougly test it before running it on the real production system.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Mithun Kulkarni

Joined: Jul 06, 2009
Posts: 9
The migration will be run couple of times. Once to migrate all data once to Oracle and then to migrate any differential data for a period of couple of weeks.
I agree. Here's the link:
subject: A Java program or an ETL?
jQuery in Action, 3rd edition