Two Laptop Bag*
The moose likes Groovy and the fly likes Groovy : Process CSV to SQL Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Languages » Groovy
Bookmark "Groovy : Process CSV to SQL" Watch "Groovy : Process CSV to SQL" New topic
Author

Groovy : Process CSV to SQL

Paul Ristevian
Greenhorn

Joined: Jun 24, 2009
Posts: 12
Hello there,
I need to pass data from a CSV file to a database, so hence I thought of CSV->SQL.
In my CSV file, it has about 70 fields per row of data. And there are other issues.
I found someone else online having the same problem with similar data issues of their CSV file.
I will put the other persons example, but illustrates the problem that I have very well:

*******************************************************************************
http://stackoverflow.com/questions/782353/sql-server-bulk-insert-of-csv-file-with-inconsistent-quotes

Is it possible to BULK INSERT (SQL Server) a CSV file in which the fields are only OCCASSIONALLY surrounded by quotes? Specifically, quotes only surround those fields that contain a ",".

In other words, I have data that looks like this (the first row contain headers):

id, company, rep, employees
729216,INGRAM MICRO INC.,"Stuart, Becky",523
729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114
721177,GEORGE WESTON BAKERIES INC,"Hogan, Meg",253

Because the quotes aren't consistent, I can't use '","' as a delimiter, and I don't know how to create a format file that accounts for this.

I tried using ',' as a delimter and loading it into a temporary table where every column is a varchar, then using some kludgy processing to strip out the quotes, but that doesn't work either, because the fields that contain ',' are split into multiple columns.

*******************************************************************************


This is the type of problem I am having with fields such as this:
"Stuart, Becky".

I have some solutions, I could write a Groovy script to drop all quotes and then count the
field place of where Stuart and Becky are and concatenate them.

But then I run into another problem of how to construct the SQL string.
For example I would plan to write an SQL INSERT statement in a Multiline string using the operator """,
and do variable interpolation with the fields.
But I consider my approach kind of long.

For example I don't want to write:

INSERT INTO VALUES( $fields{1}, $fields{2), $fields{3},$fields{4},$fields{5},.......,$fields{70});

Is there a way I can maybe write a for-loop to create this INSERT statement.
Because I don't want to write /hardcode 70 fields into my code.

SO that is my approach.
If somebody knows a totally different easier approach, please let me know.

Sincerely,

Paul













Paul Ristevian
Greenhorn

Joined: Jun 24, 2009
Posts: 12
OK,

It seems my original question may be a difficult one.
Let me rephrase my question into a smaller chunk.

OK, I hope that someone can help me with this one.

As a first step, I need to be able to parse a CSV file.
And this parsing needs to take into account embedded comma in a text qualifier, specifically the double quotes.

So for example the line:
729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114

Should be parsed into fields/elements:

(1) 729235
(2) GREAT PLAINS ENERGY, INC.
(3) Nelson, Beena
(4) 114


SO has anyone created a CSV parser in Groovy?

Anybody?


Sincerely,

Paul








Gregg Bolinger
GenRocket Founder
Ranch Hand

Joined: Jul 11, 2001
Posts: 15299
    
    6

One of the nice things about Groovy is that it is java and you can use java code directly in your groovy code. Instead of trying to figure out a way in groovy, why not just use what someone already wrote in Java?

http://opencsv.sourceforge.net/


GenRocket - Experts at Building Test Data
Paul Ristevian
Greenhorn

Joined: Jun 24, 2009
Posts: 12

Thanks for that.
I looked at the different Java parsers, but was hoping something like that was done
in Groovy, because the code in Groovy is usually shorter and more friendly to read in general.

P
Gregg Bolinger
GenRocket Founder
Ranch Hand

Joined: Jul 11, 2001
Posts: 15299
    
    6

Paul Ristevian wrote:
Thanks for that.
I looked at the different Java parsers, but was hoping something like that was done
in Groovy, because the code in Groovy is usually shorter and more friendly to read in general.

P


Yea, but when using an API, how cares?
Paul Ristevian
Greenhorn

Joined: Jun 24, 2009
Posts: 12
OK,
But when I looked at OpenCSV, I could not figure out how it would handle
embedded comma in the double quote text qualifier.
Does it do this automatically?

Here is some code below:
***********************************************************************
How do I read and parse a CSV file?
If you want to use an Iterator style pattern, you might do something like this:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}


OR

Can I use my own separators and quote characters?
Yes. There are constructors that cater for supplying your own separator and quote characters. Say you're using a tab for your separator, you can do something like this:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t');


And if you single quoted your escaped characters rather than double quote them, you can use the three arg constructor:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '\'');


You may also skip the first few lines of the file if you know that the content doesn't start till later in the file. So, for example, you can skip the first two lines by doing:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '\'', 2);


**************************************************************************
It talks about escaping characters and, skipping lines, but I don't see where it
handles my particular case.


If you can tell me how this handles the parsing case I mention please let me know.

Paul










Gregg Bolinger
GenRocket Founder
Ranch Hand

Joined: Jul 11, 2001
Posts: 15299
    
    6

Have you tried it to see what happens when you feed it your file?
Paul Ristevian
Greenhorn

Joined: Jun 24, 2009
Posts: 12

OK,

You know what I just found a description that describes what it can do on the opencsv site:
Somehow I missed this when I was reading thru it in the past.

*********************************************************************
What features does opencsv support?
opencsv supports all the basic csv-type things you're likely to want to do:

* Arbitrary numbers of values per line
* Ignoring commas in quoted elements
* Handling quoted entries with embedded carriage returns (ie entries that span multiple lines)
* Configurable separator and quote characters (or use sensible defaults)
* Read all the entries at once, or use an Iterator style model
* Creating csv files from String[] (ie. automatic escaping of embedded quote chars)
*********************************************************************

You see I like to understand things first before attempting any kind of trial and error approach. That is why I ask questions on forum sites.


Paul

Gregg Bolinger
GenRocket Founder
Ranch Hand

Joined: Jul 11, 2001
Posts: 15299
    
    6

Paul Ristevian wrote:

You see I like to understand things first before attempting any kind of trial and error approach. That is why I ask questions on forum sites.


Paul



Understanding is great. But we expect a bit of effort on your part. You try something, it doesn't work, you ask us for help, and we point you in the right direction.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Paul Ristevian wrote:You see I like to understand things first before attempting any kind of trial and error approach. That is why I ask questions on forum sites.

But there's a limit beyond which that isn't practical. Sometimes it's more useful to just try things and see what happens. In fact, quite often that's the best approach.
Suhu Shah
Greenhorn

Joined: May 24, 2010
Posts: 2
Hi folks,

So I m in a similar situation as this and I need some guidance in moving forward. We need a dynamic parser that will be working with a series of CVS files.. each having different fields and values.. however the format would be something like below..

"NAME","ADDRESS","CITY","STATE","POSTAL_CODE","CNTRY_CODE"
"NY-Correctional Facility","Exchange St","ATTICA","NY","14011","US
"
"Consolidated School Corp","205,East Street","Attica","IN","47918","US
"

Looking at OpenCSV it seems it can handle most of the data issues we have seen in CSV files, like comma inside values, carriage returns etc. However one of our requirements is that after parsing the CSV files I need the tool to return it as a list with the format attribute:value where attribute would be the corresponding field from the header line.
ie something like this..

NAME:NY-Correctional Facility, ADDRESS:Exchange St, CITY:ATTICA, STATE:NY, POSTAL_CODE:14011,CNTRY_CODE:US

Is there any way I can get OpenCSV to do this?

I m kinda new to all this, so if the solution is simple, do excuse me

Thanks in advance



 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Groovy : Process CSV to SQL
 
Similar Threads
Saving data to Excel files
SQL Server bulk insert
comparing csv file values with database attributes.
Groovy in the real world
JDBC for loading data into SQL Server