I am reading a CSV file using a java program.
It seems that some of the .csv files i read using my program contains junk characters, how read the contents of the file wihtout reading the junk characters?
Bogdan Baraila
Ranch Hand
Joined: May 23, 2011
Posts: 43
posted
0
I think that the only sure way is to check each character if it's valid.
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32704
4
posted
0
They are probably not junk characters. You can probably use a regular expression or the methods of the Character class to find such characters.
fred rosenberger wrote:What makes a character a 'junk' one?
while reading some of the .CSV files the following double quoted characters were present in the begining of first line ""
...all other lines were normal
The characters are the BOM for utf-8. Java is very bad at dealing with BOMs and the BOM can be removed and you should then open the file using utf-8 character encoding. My class BOMStripperInputStream plagiarised on several sites can help to remove the BOM.
Prasanna Kumaar
Ranch Hand
Joined: Feb 08, 2011
Posts: 30
posted
0
James Sabre wrote:
Prasanna Kumaar wrote:
fred rosenberger wrote:What makes a character a 'junk' one?
while reading some of the .CSV files the following double quoted characters were present in the begining of first line ""
...all other lines were normal
The characters are the BOM for utf-8. Java is very bad at dealing with BOMs and the BOM can be removed and you should then open the file using utf-8 character encoding. My class BOMStripperInputStream plagiarised on several sites can help to remove the BOM.
Actually My CSV files format is
1231231,some string i.e first 7 characters are numbers then separator then some text.
the junk characters occurs before the number part it seems. can you help me out?
Prasanna Kumaar wrote:
Actually My CSV files format is
1231231,some string
i.e first 7 characters are numbers then separator then some text.
the junk characters occurs before the number part it seems. can you help me out?
Please re-read my response. I have explained what the 3 junk characters are and how to deal with them. To re-iterate; to understand the problem follow the link to the BOM Wikipedia site. You can either manually remove the junk characters or you can employ my much plagiarised class BOMStripperInputStream obtained by following the other link I gave.
Prasanna Kumaar
Ranch Hand
Joined: Feb 08, 2011
Posts: 30
posted
0
James Sabre wrote:
Prasanna Kumaar wrote:
Actually My CSV files format is
1231231,some string
i.e first 7 characters are numbers then separator then some text.
the junk characters occurs before the number part it seems. can you help me out?
Please re-read my response. I have explained what the 3 junk characters are and how to deal with them. To re-iterate; to understand the problem follow the link to the BOM Wikipedia site. You can either manually remove the junk characters or you can employ my much plagiarised class BOMStripperInputStream obtained by following the other link I gave.
I opened the file using UTF-8 . but now "" these characters are replaced by "?"
Prasanna Kumaar wrote:[
I opened the file using UTF-8 . but now "" these characters are replaced by "?"
This is irrelevant. I'm at a loss as to what more I can say without just repeating what I have told you already. I have given you a reference to allow you to understand the problem and a reference to a class that will, if used correctly, almost certainly solve the problem. I can't see your existing code so I can't say exactly how you need to use the BOMStripperInputStream class but it is almost trivial to use.
Please please please put in a bit of effort to understand the problem and understand the solution.
Bye
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to
run our stuff on 16 servers instead of 3.