Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Data Conversion

 
Susan
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

we have a large data conversion project looming - whereby we need to convert production data to a different relational database with completely different schema's.

I'd appreciate it if anybody can point me to any reference material that would be a good starting point for the design of this.


thanks,

Susan
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is really an ETL kind of question. ETL = Extract, Transform, and Load.

I think the best and quickest way to move data from one database to another is through database tools. For instance, Oracle has SQLLoader to load data into Oracle, and they also have an ETL tool. But you usually write something in the old database to extract the data, transform it and save it in another format that the other database knows how to load. As an example.

Mark
 
Scott Ambler
author
Ranch Hand
Posts: 608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've written a few things that might help:
1. The Joy of Legacy Data
2. Agile Legacy Integration Modeling
3. The Retirement Phase
4. Contract Models

- Scott
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hye Scott, I got a quick question for you.

I think I noticed that you are speaking at this year's Software Development Conference.

For their BOFs, do they usually have a submission period for submitting talk ideas? And if so, about what time period of the year do they have this?

Thanks

Mark
 
Scott Ambler
author
Ranch Hand
Posts: 608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to submit an idea for a BoF when the call for presentations goes out. For the conference in March, the call would have gone out in August or September. There will likely be a call sent out in February or March for the conference in Boston in September. Keep an eye out at www.sdexpo.com.

- Scott
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Scott, we now return the hijacked thread back to the original poster.

Mark
 
Susan
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mark,

ETL sounds good, but database specific. We have a lot of legacy installations some with different schemas, and we need to be able to support different databases, ie Oracles, db2, etc in the new version of the product. So I'm looking for approaches that are possibly more generic?


thanks,

Susan
 
Scott Ambler
author
Ranch Hand
Posts: 608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Most ETL tools support a wide range of data sources, including relational, XML, flat files, ...

From what you've described, it sounds like a classic ETL project to me. Can you provide any more details?

- Scott
 
Susan
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Scott,

heres an outline of the data conversion:
1) legacy application is installed in hundresds of installations
2) legacy installations can have different schemas, but generally 'flavours' of the same schema.
3) legacy application schema is not normalized
4) legacy application schema is not well understood
5) the new application has a completely different schema, which is normalized
6) the new application must be able to support any JDBC complaint database, initially starting with Oracle.

They are the basics, unfortunately I do not know database sizes, how long we have to perform conversion, etc.


thanks,

Susan
 
Fintan Conway
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Susan Oso:
Scott,

heres an outline of the data conversion:
1) legacy application is installed in hundresds of installations
2) legacy installations can have different schemas, but generally 'flavours' of the same schema.
3) legacy application schema is not normalized
4) legacy application schema is not well understood
5) the new application has a completely different schema, which is normalized
6) the new application must be able to support any JDBC complaint database, initially starting with Oracle.

They are the basics, unfortunately I do not know database sizes, how long we have to perform conversion, etc.


thanks,

Susan

Hi Susan,

One approach would be to develop a database mapping layer between your Java objects and your legacy systems (as described in Martin Fowler's Patterns of Application Architecture). You can then develop the application on top of the legacy database.

The second step would be to use JDO to automatically create the schema in the database of your choice (Oracle, DB2, mySQL, etc.) JDO automatically takes the objects of your application and creates a DB schema for you. This is normally a table in the database per Java object.

The mapping layer you created originally will show you how to take the data from your legacy app and convert it to the structure of your Java objects - which will also be the structure of your new tables.

You can then perform the extraction of data from the legacy apps into the format for your Java objects/tables, and then load the extracted data into your database.

With JDO Changing database e.g. Oracle to DB2 is simply a case of modifying a config file to point to the new DB and its JDBC driver.

This is the approach that I would take.

HTH,

Fintan
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Fintan, unfortunately, I think Susan need to migrate the data out of the legacy system. They are creating a new system and need to get all the data that is at all the installations and port the data into another database.

The biggest problem is that the installations all have different data models for their data.

Mark
 
Susan
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mark,

you are correct - we need to export the data from the legacy database into a new database. Hence, the mapping layer that Fintan described wouldnt work for us.


thanks,

Susan
 
Edwin Keeton
Ranch Hand
Posts: 214
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Susan,

You have two distinct problems. The first is an ETL problem as Mr. Ambler and others have described, getting the data out of your legacy application database schemas into the schema for the new application. The second is an ORM problem, allowing the new application to use interchangeable databases. I don't think it would be wise (even if possible) to attempt to solve them both at the same time.

Not meaning to trivialize, but the ORM problem is �simple� in that you select an appropriate tool (Hibernate, JDO, iBatis for example) and build your application accordingly. (Leaving the actual details unsaid.)

Data conversion on the other hand, rapidly degenerates towards chaos as you discover specific peculiarities in the way legacy data was actually used versus the way someone thought it would be used, missing and inconsistent data, duplicated data, and other issues. Each installation conversion may possibly present unique challenges so there is probably little advantage in attempting an intermediate mapping layer.

The classic ETL problem doesn't necessarily have an elegant or even a clean solution. I would do the mapping from the legacy schema to the new schema with the tools provided by whichever RDBMS you're using, scrub data as required, and let the tool pump data from your legacy installations.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic