This week's book giveaway is in the Cloud/Virtualization forum.
We're giving away four copies of Pipeline as Code and have Mohamed Labouardy on-line!
See this thread for details.
Win a copy of Pipeline as Code this week in the Cloud/Virtualization forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Rob Spoor
  • Henry Wong
  • Liutauras Vilda
Saloon Keepers:
  • Tim Moores
  • Carey Brown
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
Bartenders:
  • Frits Walraven
  • Himai Minh
  • Jj Roberts

Compare two CSV files in java

 
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi
i'm trying to compare the content of two csv files. I have the csv file test1.csv and test2.csv. The content from both should be the same.. if not , then I want to transfer the difference into a .txt file. If every thing is equal, everything is correct.

I just created two test csv files with columns and rows with content

The first column is a primary key of the respective table. I want to compare it by the identificator

test1.csv
1,Max,New York
2,David,Jersey
test2.csv
1,Max,California
2,David,Jersey
The Output here in the .txt should be the row "1,Max,New York"

I just have no code. And I am happy about every adivce and hint I can get. Thank you in advance.
 
Saloon Keeper
Posts: 6890
163
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If this is supposed to run on a Unix-ish OS (like Linux or OS X), I'd run a "diff" of the two files (using the ProcessBuilder class), and then work with its output.

On Windows, something like "fc" could be used; see https://stackoverflow.com/questions/6877238/what-is-the-windows-equivalent-of-the-diff-command
 
Lia Tas
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to code that in eclipse. Noting with Linux .. have to CSV files on my desktop and want to implement that .. Should I use Apache Poi have want to use it for the Java version 1.5
 
Rancher
Posts: 531
6
IntelliJ IDE Spring Fedora
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
csv files are just values separated by commas and line separators.  You can just go through the text line by line and compare them. You wanna put more logic than that probably but you can probably start with that.
 
Lia Tas
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am not really sure how to do that
 
Tim Moores
Saloon Keeper
Posts: 6890
163
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Using Eclipse and some sort of diff utility does not conflict with one another. But if you're set on not saving yourself the work of writing a diff tool, I suggest to use one of the existing CSV libraries - writing a CSV parser that covers the edge cases is more work than it looks at first; see https://coderanch.com/wiki/660373/Accessing-File-Formats in the "Excel" section.

POI handles Microsoft Office file formats, which CSV is not.
 
Al Hobbs
Rancher
Posts: 531
6
IntelliJ IDE Spring Fedora
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is it possible to use 'fc' in the case of windows or 'diff' from a java program?  If it's possible is that even recommended or not?
 
Lia Tas
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am not really sure what you mean .

I implemented a code which stores the content of each CSV files in a array list.. But is that a good solution ? My college told me something about hashmaps?! Is that a better solution to store the content of a csv file ?? And how can I print the line (based on the primary key of the table) which is different in both files..

Thank you.
 
Tim Moores
Saloon Keeper
Posts: 6890
163
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's not immediately obvious to me how you'd store the contents of a CSV file as a HashMap (although it is of course possible to come up with a way that utilizes them). So I can't opine on whether it might be better in some way; maybe ask your colleague what he has in mind.

You seem not to want to look into an existing library - why is that?
 
Tim Moores
Saloon Keeper
Posts: 6890
163
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Al Hobbs wrote:Is it possible to use 'fc' in the case of windows or 'diff' from a java program?  If it's possible is that even recommended or not?


Yes, that's possible. Runtime.exec and the more modern ProcessBuilder class make it possible. I don't see why one wouldn't use those tools if they're available, although they do make the code less portable - which may or may not be a consideration.
 
Rancher
Posts: 4801
50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
WLooking at the original post, do you actually care about the individual fields, or just that the row with an id '1' in the first CSV has a different value to the row with an id '1' in the second CSV?

If so, then I would argue that the CSV part is possibly not relevant.
You just need the contents of each row (a String) and the id (the first bit of that String, which is just a case of substringing).
 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Read the entire first file, and put it into a List. Then read the second file one row at a time, and compare each row to all the rows of the first file to see if it's a duplicate. If it's not a duplicate, then it's new information. If you're having trouble with reading, look at http://opencsv.sourceforge.net/, it's a pretty good library for reading CSV files in Java.
 
Dave Tolls
Rancher
Posts: 4801
50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
From the first post:
"The first column is a primary key of the respective table. I want to compare it by the identificator"

That implies that it's the first field that determines what to compare against (ie compare the rows with matching id's).
 
Saloon Keeper
Posts: 23414
159
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Lia Tas wrote:I want to code that in eclipse.



I hope not. I think you want to code that in Java. Eclipse is just a program to help develop Java applications - and other things.

There are two ways to compare CSV's. One is via the Unix-style "diff" program, which does character-by-character comparisons. This only compares raw text, though. Windows has a COMPARE program, but I think it only checks for complete equality, not line-by-line differences.

The other way is to parse the files into their constituent components and compare component-by-component. That catches value differences while ignoring differences in how elements were quoted, spaces between elements, and the like.

CSV's do not have "key" fields. If you want to compare by keys, you either have to pre-sort the files to be compared or you'll have to read one of the files into memory so that you can access its lines randomly.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My simple solution in case you want to compare two csv responses stored in string variables (in the case you get them through a REST call). In my case I wanted to exit the check after a threshold of 10 different lines.
 
Marshal
Posts: 72441
315
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

If you can identify lines with differences, it should be easy enough to count them; use a loop.
 
Hey, sticks and stones baby. And maybe a wee mention of my stuff:
SKIP - a book about connecting industrious people with elderly land owners
https://coderanch.com/t/skip-book
reply
    Bookmark Topic Watch Topic
  • New Topic