aspose file tools*
The moose likes Java in General and the fly likes sorting csv file for fixing column order Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "sorting csv file for fixing column order" Watch "sorting csv file for fixing column order" New topic
Author

sorting csv file for fixing column order

Padmanabh Sahasrabudhe
Ranch Hand

Joined: Mar 04, 2008
Posts: 53
I get a csv file from an export utility which has different column order every time. For example the first time it may export the following csv file

A,B,C
1,a,e
3,q,w
2,e,r

The second time it may export the same file as following data:

B,A,C
a,1,e
e,2,r
q,3,w

I am not bothered about change in row order since I have a program which can compare the two csv files correctly even if they have rows out of order but I don't know how to overcome the change in column order. Is there a way to process this csv file and get another file of fixed column order?

Thanks.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39784
    
  28
My, that looks like a strange problem. You cannot specify the output into the csv file so the column orders are fixed?
How many columns are there? Can you create factory methods which take the different values in different orders? Remember the number of methods required is n! where n is the number of columns which might be reordered.
Are the column names always the same? Can you create some sort of map from column name to column value?
Junilu Lacar
Bartender

Joined: Feb 26, 2001
Posts: 4991
    
    8

Am I correct to assume that A, B, and C are the column headers and 1, 2, and 3 are your row headers?

If so, then it's simply a matter of mapping against row and column headers instead of just row headers (since you mentioned that the row ordering doesn't bother you). Show us some code so we have a better idea of how you're doing the comparison and where it is messing up.


Junilu - [How to Ask Questions] [How to Answer Questions]
Padmanabh Sahasrabudhe
Ranch Hand

Joined: Mar 04, 2008
Posts: 53
Junilu,
ABC are column headers. But 1,2,3 need not be row headers since the order in which the data is exported is uncertain. Assuming I get consistent column order (say A,B,C everytime) I use following code. But I am not sure how to deal with it when it comes out of order (B,A,C).



Thanks,
Padmanabh
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39784
    
  28
Why are you using sets rather than maps?
Padmanabh Sahasrabudhe
Ranch Hand

Joined: Mar 04, 2008
Posts: 53
Ritchie,

Not sure what you meant? Could you please demonstrate with little code? Also, how using maps will help me getting rid scenario where columns and rows both are out of order?

Thanks,
Padmanabh
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39784
    
  28
A map would allow you to retain the relationship between the column and its contents. Are you simply trying to see whether the contents of the columns form disparate sets or not? If so, then sets are all right.
Padmanabh Sahasrabudhe
Ranch Hand

Joined: Mar 04, 2008
Posts: 53
I only wish to see if the rows which file 1 has are all present in file 2 or not. I need not retain them. The contents of the two files should same row wise.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39784
    
  28
Do you worry about duplicates or ordering? Sets will only work if ordering and duplicates are not significant. Can you use the equals() method to check for equality of contents?
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8186
    
  23

Padmanabh Sahasrabudhe wrote:I only wish to see if the rows which file 1 has are all present in file 2 or not. I need not retain them. The contents of the two files should same row wise.

It's not yet clear from you explanation, but I suspect you have two separate problems here: row order and column order. The first is (probably) a simple sorting exercise, the other is a mapping one, which supports Campbell's post.

For the latter, you will need some way of specifying the new order for your columns.

The simplest way to do that in Java is to supply column indexes in the order that you want them output, so if you intend to supply column identifiers instead (which, I assume, is what 'A', 'B', 'C'...etc. are), then you will need some way of translating your "new column order" input into a set (or array) of indexes, and then using that to rearrange the output for each line.

It should be added that reading CSV files can be quite involved: It's not simply a case of splitting data based on commas (unless you're absolutely sure that's the case), so you might want to look at third party libraries for reading your files.

Alternatively, if this CSV is generated from an Excel spreadsheet, you might want to look at Apache POI, because you may well be able to process it directly, rather than via CSVs.

HIH

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Padmanabh Sahasrabudhe
Ranch Hand

Joined: Mar 04, 2008
Posts: 53
All,

I think I need to reframe my problem. I was behind a wrong issue. My issue is I have these two files

A,B,C
1,a,e
3,q,w
2,e,r

and
B,A,C
a,1,e
e,2,r
q,3,w

Technically, both of these files contains same data which I want to verify the same through my code. Winston, I am not generating data from Excel but thanks for your pointers.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8186
    
  23

Padmanabh Sahasrabudhe wrote:Technically, both of these files contains same data which I want to verify the same through my code. Winston, I am not generating data from Excel but thanks for your pointers.

Right, well first you need to arrange the data from both files in the same column sequence; and to do that you must have:
  • A line that identifies columns uniquely.
  • A way of knowing which line that is (in your case, it would appear to be the first).

  • In addition, you may also need to know:
  • The order they should be in (ie, a way of ordering columns by their ID). If you haven't been told that, then you'll need to choose one for yourself.
  • What to do with duplicated and/or missing columns, if such situations are allowed. (NOTE: Only one file will be able to have them).

  • After that, it's simply an issue of mapping columns in a consistent order and (I suspect) sorting rows based on their "mapped" content - unless you want some form of diff algorithm, which is rather more advanced.

    HIH

    Winston
    Padmanabh Sahasrabudhe
    Ranch Hand

    Joined: Mar 04, 2008
    Posts: 53
    Sorted out the column order issue. I am pasting the code here for others if ineterested. I used opencsv here.

     
    Consider Paul's rocket mass heater.
     
    subject: sorting csv file for fixing column order