File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JDBC and the fly likes Problem importing data from csv files into mysql database. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Problem importing data from csv files into mysql database." Watch "Problem importing data from csv files into mysql database." New topic
Author

Problem importing data from csv files into mysql database.

chaitanya karthikk
Ranch Hand

Joined: Sep 15, 2009
Posts: 800

Hi all, I am facing a problem while importing data into tables from a CSV file in mysql. The problem is not related to some syntax, it a logical question.

The scenario is we have 2 tables with parent-child relationship.

Layer
^
|
LayerImageSet




Actually I have a spreadsheet in which we have the data to be inserted into the table. There were lot of records to be inserted. So I took all layers into a csv file like thisThen I imported the csv file in to appropriate table like this

mysql> load data local infile 'C:\\Documents and Settings\\Krishna.chaitanya\\test1.csv' into
table <<database>>.layer fields terminated by',' enclosed by '"' lines terminated by '\r\n' (layeruid, name);

Then for LayerImageSet I created another csv file and inserted the following data like thisNow the problem is I need to send the csv files to some other clients. They can easily import the first csv file say Layer.csv.

The problem arises when trying to import the second file LayerImageSet.csv because the <<id_generated_above_insertion___hardcoded>> may be different on their databases.

In my case there were 48 records in the Layer table and the new LayerUId-s were generated from 49. If there are 100 records in other clients database, LayerUID-s will start from 101. Then the second csv will fail.

Is there any alternate method for this? Is there any PL/SQL fix for this problem? Or I should write a java program to read csv files and add records to the appropriate table?



Thank you all in advance. Good day.


Love all, trust a few, do wrong to none.
Scott Selikoff
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3704
    
    5

It sounds like your trying to move data around from database to database while also using auto-generated ids. You could override the generated keys with a fixed set of keys, or have a secondary (external) key used to map one to the other.


My Blog: Down Home Country Coding with Scott Selikoff
chaitanya karthikk
Ranch Hand

Joined: Sep 15, 2009
Posts: 800

Scott Selikoff wrote:It sounds like your trying to move data around from database to database while also using auto-generated ids. You could override the generated keys with a fixed set of keys, or have a secondary (external) key used to map one to the other.

I am not actually trying to move data from one database to other. We have some data that has to be populated initially. There is lot of data actually.

Can you please tell me how can I override the generated keys with a fixed set of keys?

However I din't understand the second point you mentioned
or have a secondary (external) key used to map one to the other
Can you please explain this technique? If it has something to do with changing the definition of the table, I am not supposed to change the definition.

Thank you in advance and thanks for the reply Mr. Scott.
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1670
    
  14

Not sure I understand what you are doing, but it sounds like you are assigning the wrong key to your child records?

One option might be to use external tables for your CSV files. I don't use MySQL, but you can do the same thing in Oracle, and it's a great way to get access to your CSV files from within the database using SQL. Then you can load your parent table first from the relevant external table, assigning primary keys as needed. Then load your child table, looking up the parent ID via a join to the parent table, using the natural key of the parent records, as you do so.

Alternatively, load your child data into a temporary table, then copy it over to the target child table via SQL using a join on the parent table to get the parent ID.

Either way, the easiest thing is to get the data into the database - in external or temporary tables as appropriate - then use SQL to do all the data manipulation: that's what SQL is for.


No more Blub for me, thank you, Vicar.
chaitanya karthikk
Ranch Hand

Joined: Sep 15, 2009
Posts: 800

Hi Mr. Chris. this is what I am trying to do.

Input a CSV file in which the records are there. Load the records into parent table (Layer). Since the Layer table primary key is a auto_increment, the keys generated will differ accross each client/machine/database. In my database the keys started at 49 since already 48 records are existing in it. There is no rule that on some others developer machine Layer table will have 48 entries and the key generation will start from 49 only. If other developer has 0 records in it, then the key generation will start from 1. This is not a problem.

The actual problem arises when I try o load the second/child CSV file into the child table. I have another CSV file where I put all the primary keys generated from the above insertion into this file. In my case it started from 49. So I have put 49, 50, and so on. Then I loaded the file. The insertion was successful.

Now if I transfer the files to the second developer, he can load the first CSV file successfully. Now suppose his primary key generation started from 1 and ended at say 20. Now when he tries to load the second CSV file into the child table errors arise. There are no records with primary keys 49 and so on in the parent table for this client. But I hard coded values in second CSV files as 49, 50 and so on. Here the problem arises.

How to fix this issue?

This is how the sample data will be in each CSV file


Thank you, good day.
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1670
    
  14

Sorry, I don't have time to go into much detail here.

Your parent table must have some natural (business) key to uniquely identify each of the original records. If not, then you don't know how to match the children to the parents anyway, so you need to go back and figure out what your data means. And if you don't understand what a natural/primary/surrogate key is, then you probably need to go back and learn something about basic data modelling.

If I've understood you correctly, your auto-increment ID is simply a (local) surrogate key on the parent records, which will be different on other machines under your current approach.

1. Identify the natural unique key for your parent record.
2. Load the parents into their table, allocating the surrogate (auto-increment ID) key as you do so.
3. Now you need to match the children to their parents. There are different ways to do this e.g. I would put them in a temporary/external table and then do it all in one SQL INSERT statement using a join to the parent table. But the logic is like this:

This way, you set the parent surrogate keys locally on each machine, but you always use the natural key to identify the parent/child relationship, which gives you the correct parent ID to use for the children on each local instance.
chaitanya karthikk
Ranch Hand

Joined: Sep 15, 2009
Posts: 800

I understood this, but I am not supposed to change the database definition. The natural key you are talking about is related to do something with the table definition right?
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1670
    
  14

chaitanya karthikk wrote:I understood this, but I am not supposed to change the database definition. The natural key you are talking about is related to do something with the table definition right?


To be honest, it's starting to sound like nobody in your organisation has a clue about databases. Simply slapping a unique ID on a line of data does not mean you have identified the real key for the record, or that you have created a proper data model.

There must be some attribute or combination of attributes that uniquely identifies each record in your list of parent records. This is the natural primary key for the parent table. This cannot be the auto-increment ID column, because the ID is an artificial surrogate key that you are generating at runtime and you have already told us that each parent record may have a different ID on different machines. So you must have some way of knowing which parent is which, without using the ID.

So, imagine you are working on paper and you don't have an ID at all (because when you start this process you don't): how do you know how to match a child record to its parent? What makes one parent record different from the next? Whichever attribute you use to do this is the natural primary key for the parent.

Once you know how to uniquely identify a parent for the children (without using the ID), then you can solve your problem. If you don't know how to do this, either ask somebody who does (your DBA? the analyst?), or there is no way to uniquely identify a parent, in which case your data model is wrong anyway.
chaitanya karthikk
Ranch Hand

Joined: Sep 15, 2009
Posts: 800

chris webster wrote:
chaitanya karthikk wrote:I understood this, but I am not supposed to change the database definition. The natural key you are talking about is related to do something with the table definition right?


To be honest, it's starting to sound like nobody in your organisation has a clue about databases. Simply slapping a unique ID on a line of data does not mean you have identified the real key for the record, or that you have created a proper data model.
Yes I agree I am not a database guy, I only know few things about how to design a schema. All I know is, the tables must not have redundancy, tightly integrated using parent-child relationships. But I am always enthusiastic to learn. Can you please let me know the key principles in designing a database?

From you reply I am assuming that tables must have a natural primary key and a surrogate key(auto_increment). Is it? Can I create a table with two keys? So far as I know a table cannot contain 2 primary keys. It can contain a primary key and a unique key. Are you saying natural primary key is nothing but a unique key?

This is the procedure I followed to dump the excel data in to my table. There were almost 110 rows in the spreadsheet given with many columns each column referring different columns of a table.

There is a column called layerImage which holds the name of images. There will be duplicate names also. Each name should have an entry in the Layer table. So I created a CSV file and copied all the images into a column. Then in the same CSV file I added another column with default value as "default" because it is the value to be given to the primary key column in the Layer table. Then I used the LOAD statement and dumped all the records into Layer table.

Then I had to insert data into LayerImageSet table with column names as LayerImageSetUID, IsPrimary, LayerUID, LayerImageSetName. For each layer name in the Layer table there must be a record in the LayerImageSet table. So I created another CSV file and added columns as default, <<IsPrimary_Value>>, <<LayerUID_generated_from_above_insertion>>, <<layer_image_name>>. The problem raised with the third column.

I don't have any natural primary key to uniquely identify the records in Layer table. There are duplicate image names but with different column values. But no column is unique except the LayerUID column in Layer table.

I am not bothered about how are they going to do this. Because now they don't want me to solution for this problem. But I am enthusiastic to learn things. How can I fix this problem? How should I design a database? What are the key principles to keep in mind?

Thank you, good day Mr. Chris.
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1670
    
  14

chaitanya karthikk wrote:Can you please let me know the key principles in designing a database?


Sorry, but I don't have time to run Databases 101 or do your project for you. Look for any introductory textbook on relational databases, any online course materials or tutorials you can find. It sounds like you need to know a bit about normalisation, whic is basically how you work out the key relationships in your data.

You need to find the (combination of) attributes that uniquely identify your "parent" records: this will be the natural unique key. You have said you want to allocate a unique ID to each record - this is a surrogate key. You can have more than one unique key on a table, but only one of them will be defined as the primary key - you can choose. But when you are matching children to parents, you will need to know how to find the parent for each child i.e. using the natural key.

If your data does not seem to have any unique keys, then you need to normalise it or find some other way to process the data into a database that is expecting unique keys etc. You've told us you can't change the data model, so you need to find out how to map your data into that model.

chaitanya karthikk
Ranch Hand

Joined: Sep 15, 2009
Posts: 800

Thank you Mr. Chris. If I have any other doubts I will post here. Thanks for letting me know. Thanks a lot
 
 
subject: Problem importing data from csv files into mysql database.