aspose file tools
The moose likes Java in General and the fly likes CSV file in java Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "CSV file in java" Watch "CSV file in java" New topic
Author

CSV file in java

Prasanna Kumaar
Ranch Hand

Joined: Feb 08, 2011
Posts: 30
I am reading a CSV file using a java program.
It seems that some of the .csv files i read using my program contains junk characters, how read the contents of the file wihtout reading the junk characters?
Bogdan Baraila
Ranch Hand

Joined: May 23, 2011
Posts: 43
I think that the only sure way is to check each character if it's valid.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32704
    
    4
They are probably not junk characters. You can probably use a regular expression or the methods of the Character class to find such characters.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 9950
    
    6

What makes a character a 'junk' one?


Never ascribe to malice that which can be adequately explained by stupidity.
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

fred rosenberger wrote:What makes a character a 'junk' one?


I would almost guarantee that the junk characters are as a result of the file being read/decoded using the wrong character encoding.


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
Bogdan Baraila
Ranch Hand

Joined: May 23, 2011
Posts: 43
James Sabre wrote:
fred rosenberger wrote:What makes a character a 'junk' one?


I would almost guarantee that the junk characters are as a result of the file being read/decoded using the wrong character encoding.


It depends. A client once wanted for me not to integrate certain characters (replace them with ?).

If it's indeed an encoding problem you could try using something like this:
FileInputStream propsIS = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(propsIS, "UTF8");
Prasanna Kumaar
Ranch Hand

Joined: Feb 08, 2011
Posts: 30
fred rosenberger wrote:What makes a character a 'junk' one?


while reading some of the .CSV files the following double quoted characters were present in the begining of first line ""

...all other lines were normal
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Prasanna Kumaar wrote:
fred rosenberger wrote:What makes a character a 'junk' one?


while reading some of the .CSV files the following double quoted characters were present in the begining of first line ""

...all other lines were normal


The characters are the BOM for utf-8. Java is very bad at dealing with BOMs and the BOM can be removed and you should then open the file using utf-8 character encoding. My class BOMStripperInputStream plagiarised on several sites can help to remove the BOM.
Prasanna Kumaar
Ranch Hand

Joined: Feb 08, 2011
Posts: 30
James Sabre wrote:
Prasanna Kumaar wrote:
fred rosenberger wrote:What makes a character a 'junk' one?


while reading some of the .CSV files the following double quoted characters were present in the begining of first line ""

...all other lines were normal


The characters are the BOM for utf-8. Java is very bad at dealing with BOMs and the BOM can be removed and you should then open the file using utf-8 character encoding. My class BOMStripperInputStream plagiarised on several sites can help to remove the BOM.


Actually My CSV files format is
1231231,some string
i.e first 7 characters are numbers then separator then some text.
the junk characters occurs before the number part it seems. can you help me out?
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Prasanna Kumaar wrote:
Actually My CSV files format is
1231231,some string
i.e first 7 characters are numbers then separator then some text.
the junk characters occurs before the number part it seems. can you help me out?


Please re-read my response. I have explained what the 3 junk characters are and how to deal with them. To re-iterate; to understand the problem follow the link to the BOM Wikipedia site. You can either manually remove the junk characters or you can employ my much plagiarised class BOMStripperInputStream obtained by following the other link I gave.
Prasanna Kumaar
Ranch Hand

Joined: Feb 08, 2011
Posts: 30
James Sabre wrote:
Prasanna Kumaar wrote:
Actually My CSV files format is
1231231,some string
i.e first 7 characters are numbers then separator then some text.
the junk characters occurs before the number part it seems. can you help me out?


Please re-read my response. I have explained what the 3 junk characters are and how to deal with them. To re-iterate; to understand the problem follow the link to the BOM Wikipedia site. You can either manually remove the junk characters or you can employ my much plagiarised class BOMStripperInputStream obtained by following the other link I gave.


I opened the file using UTF-8 . but now "" these characters are replaced by "?"
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Prasanna Kumaar wrote:[
I opened the file using UTF-8 . but now "" these characters are replaced by "?"


This is irrelevant. I'm at a loss as to what more I can say without just repeating what I have told you already. I have given you a reference to allow you to understand the problem and a reference to a class that will, if used correctly, almost certainly solve the problem. I can't see your existing code so I can't say exactly how you need to use the BOMStripperInputStream class but it is almost trivial to use.

Please please please put in a bit of effort to understand the problem and understand the solution.

Bye
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: CSV file in java
 
Similar Threads
csv java example
Ranchers,Clarification on UTF-8 in Java
Can POI read CSV files
Reading a UTF-8 Encoded File
Want to convert xml to csv format