File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

A Java program or an ETL?

 
Mithun Kulkarni
Greenhorn
Posts: 9
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

In my current assignment we are planning to migrate data from DB2 to Oracle. Currently the data is stored in the form of a XML blob in DB2. However we have designed a relation data model in Oracle to store the same data (which was stored as a XML blob in DB2).

We are also introducing JPA layer for persistence to store and retrieve data to/from oracle.

I have around 50 million records in DB2 in the form of XML blob. Now I want to move this data to Oracle (in a relation db model- each entity stored in a seperate table).

I want to know which of the below approaches I should follow to do this data migration:

1. Use an ETL tool. My company has Ab Initio license, so I can use this ETL tool.[I do not have experience, but will be eager to learn].

2. Build a new tool using Java for this migration. As I had done a POC on using JPA for persisting and retrieving entities, and the JPA mappings is already there, I can reuse some of this code. And some additional processing which needs to be done in approach 1 can be avoided. [Additional processing : mapping entity to database, processing XML blob and reading its content]

So please help me in making this decision with supporting points.

Thanks in advance,
MK


 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 33697
316
Eclipse IDE Java VI Editor
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mithun,
I would use the ETL tool. It's a lot of data to move and an ETL tool is going to be better optimized than something you write.
 
Mithun Kulkarni
Greenhorn
Posts: 9
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeanne Boyarsky wrote:Mithun,
I would use the ETL tool. It's a lot of data to move and an ETL tool is going to be better optimized than something you write.


Thanks for resply Jeanne,
But what I think is an ETL does simple things in a faster way. In my case, I need to read a row from DB2, then load the XML blob, then parse the XML to convert it into a simple XML, then map the xml elements to table columns. So when I have these steps in my migration, then I may need to build custome components for ETL.
So don't you think building a java tool will be better in this case?
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Pie
Posts: 15150
31
Android IntelliJ IDE Java Scala Spring
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know that ETL tool, so I can't give you a clear answer.

You'd have to look at the features of the ETL tool and see if it can do what you need. If it doesn't, then maybe it's better to write a Java program.

An advantage of the ETL tool is that it will probably run faster than a custom program. On the other hand, you'll have to learn how to use the tool, which will also take you time. You'll have to make an estimate of how long it will take you to learn the tool and configure it to do the conversion you need, versus how long it will take you to write a Java program to do the same.

Is the conversion of those 50 million records something that you're only going to ever run once, or is it something you'll have to do many times? If it's only once, then it might not be such a problem if your Java program isn't superfast. You'd just let it run for one night and the next morning it will be done.

It sounds like an interesting project, and whatever solution you choose, it's very important that you really thorougly test it before running it on the real production system.
 
Mithun Kulkarni
Greenhorn
Posts: 9
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The migration will be run couple of times. Once to migrate all data once to Oracle and then to migrate any differential data for a period of couple of weeks.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic