aspose file tools*
The moose likes OO, Patterns, UML and Refactoring and the fly likes Data Conversion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » OO, Patterns, UML and Refactoring
Bookmark "Data Conversion" Watch "Data Conversion" New topic
Author

Data Conversion

Susan
Ranch Hand

Joined: Apr 26, 2005
Posts: 54
Hi,

we have a large data conversion project looming - whereby we need to convert production data to a different relational database with completely different schema's.

I'd appreciate it if anybody can point me to any reference material that would be a good starting point for the design of this.


thanks,

Susan
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17250
    
    6

This is really an ETL kind of question. ETL = Extract, Transform, and Load.

I think the best and quickest way to move data from one database to another is through database tools. For instance, Oracle has SQLLoader to load data into Oracle, and they also have an ETL tool. But you usually write something in the old database to extract the data, transform it and save it in another format that the other database knows how to load. As an example.

Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Scott Ambler
author
Ranch Hand

Joined: Dec 12, 2003
Posts: 608
I've written a few things that might help:
1. The Joy of Legacy Data
2. Agile Legacy Integration Modeling
3. The Retirement Phase
4. Contract Models

- Scott


<a href="http://www-306.ibm.com/software/rational/bios/ambler.html" target="_blank" rel="nofollow">Scott W. Ambler</a><br />Practice Leader Agile Development, IBM Rational<br /> <br />Now available: <a href="http://www.ambysoft.com/books/refactoringDatabases.html" target="_blank" rel="nofollow">Refactoring Databases: Evolutionary Database Design</a>
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17250
    
    6

Hye Scott, I got a quick question for you.

I think I noticed that you are speaking at this year's Software Development Conference.

For their BOFs, do they usually have a submission period for submitting talk ideas? And if so, about what time period of the year do they have this?

Thanks

Mark
Scott Ambler
author
Ranch Hand

Joined: Dec 12, 2003
Posts: 608
You need to submit an idea for a BoF when the call for presentations goes out. For the conference in March, the call would have gone out in August or September. There will likely be a call sent out in February or March for the conference in Boston in September. Keep an eye out at www.sdexpo.com.

- Scott
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17250
    
    6

Thanks Scott, we now return the hijacked thread back to the original poster.

Mark
Susan
Ranch Hand

Joined: Apr 26, 2005
Posts: 54
Mark,

ETL sounds good, but database specific. We have a lot of legacy installations some with different schemas, and we need to be able to support different databases, ie Oracles, db2, etc in the new version of the product. So I'm looking for approaches that are possibly more generic?


thanks,

Susan
Scott Ambler
author
Ranch Hand

Joined: Dec 12, 2003
Posts: 608
Most ETL tools support a wide range of data sources, including relational, XML, flat files, ...

From what you've described, it sounds like a classic ETL project to me. Can you provide any more details?

- Scott
Susan
Ranch Hand

Joined: Apr 26, 2005
Posts: 54
Scott,

heres an outline of the data conversion:
1) legacy application is installed in hundresds of installations
2) legacy installations can have different schemas, but generally 'flavours' of the same schema.
3) legacy application schema is not normalized
4) legacy application schema is not well understood
5) the new application has a completely different schema, which is normalized
6) the new application must be able to support any JDBC complaint database, initially starting with Oracle.

They are the basics, unfortunately I do not know database sizes, how long we have to perform conversion, etc.


thanks,

Susan
Fintan Conway
Ranch Hand

Joined: Apr 03, 2002
Posts: 141
Originally posted by Susan Oso:
Scott,

heres an outline of the data conversion:
1) legacy application is installed in hundresds of installations
2) legacy installations can have different schemas, but generally 'flavours' of the same schema.
3) legacy application schema is not normalized
4) legacy application schema is not well understood
5) the new application has a completely different schema, which is normalized
6) the new application must be able to support any JDBC complaint database, initially starting with Oracle.

They are the basics, unfortunately I do not know database sizes, how long we have to perform conversion, etc.


thanks,

Susan

Hi Susan,

One approach would be to develop a database mapping layer between your Java objects and your legacy systems (as described in Martin Fowler's Patterns of Application Architecture). You can then develop the application on top of the legacy database.

The second step would be to use JDO to automatically create the schema in the database of your choice (Oracle, DB2, mySQL, etc.) JDO automatically takes the objects of your application and creates a DB schema for you. This is normally a table in the database per Java object.

The mapping layer you created originally will show you how to take the data from your legacy app and convert it to the structure of your Java objects - which will also be the structure of your new tables.

You can then perform the extraction of data from the legacy apps into the format for your Java objects/tables, and then load the extracted data into your database.

With JDO Changing database e.g. Oracle to DB2 is simply a case of modifying a config file to point to the new DB and its JDBC driver.

This is the approach that I would take.

HTH,

Fintan
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17250
    
    6

Hi Fintan, unfortunately, I think Susan need to migrate the data out of the legacy system. They are creating a new system and need to get all the data that is at all the installations and port the data into another database.

The biggest problem is that the installations all have different data models for their data.

Mark
Susan
Ranch Hand

Joined: Apr 26, 2005
Posts: 54
Mark,

you are correct - we need to export the data from the legacy database into a new database. Hence, the mapping layer that Fintan described wouldnt work for us.


thanks,

Susan
Edwin Keeton
Ranch Hand

Joined: Jul 10, 2002
Posts: 214

Susan,

You have two distinct problems. The first is an ETL problem as Mr. Ambler and others have described, getting the data out of your legacy application database schemas into the schema for the new application. The second is an ORM problem, allowing the new application to use interchangeable databases. I don't think it would be wise (even if possible) to attempt to solve them both at the same time.

Not meaning to trivialize, but the ORM problem is �simple� in that you select an appropriate tool (Hibernate, JDO, iBatis for example) and build your application accordingly. (Leaving the actual details unsaid.)

Data conversion on the other hand, rapidly degenerates towards chaos as you discover specific peculiarities in the way legacy data was actually used versus the way someone thought it would be used, missing and inconsistent data, duplicated data, and other issues. Each installation conversion may possibly present unique challenges so there is probably little advantage in attempting an intermediate mapping layer.

The classic ETL problem doesn't necessarily have an elegant or even a clean solution. I would do the mapping from the legacy schema to the new schema with the tools provided by whichever RDBMS you're using, scrub data as required, and let the tool pump data from your legacy installations.


SCJP, SCWCD
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Data Conversion