This week's giveaway is in the Spring forum.
We're giving away four copies of Learn Spring Security (video course) and have Eugen Paraschiv on-line!
See this thread for details.
Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

SAXException: Invalid byte 2 of 2-byte UTF-8 sequence

 
P Lavti
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am trying to parse an XML file using SAX parser which contains Swedish character "��� ��� �".
I have used the UTF-8 encoding in the XML document.
<?xml version="1.0" encoding="UTF-8" ?>

But the parser gives me the exception:


If I use the encoding iso-8859-1, it works fine.
<?xml version="1.0" encoding="iso-8859-1" ?>

Can anybody help me in understanding why it doesn't work with UTF-8 encoding.
Is there any way I can parse my XML, using SAX parser with UTF-8 encoding.

Thanks!
[ January 25, 2008: Message edited by: P Lavti ]
 
Paul Clapham
Sheriff
Posts: 20713
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by P Lavti:
I have used the UTF-8 encoding in the XML document.
<?xml version="1.0" encoding="UTF-8" ?>
Just doing that does not cause the file to be encoded in UTF-8. You actually have to save it out of your text editor, or whatever is creating the file, in UTF-8. You didn't do that.
[ January 25, 2008: Message edited by: Paul Clapham ]
 
P Lavti
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The file is created at run time. How can I create a file in java which understand UTF-8 chars.

One more thing, If I change the encoding to iso-8859-1 it works fine.
 
Paul Clapham
Sheriff
Posts: 20713
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
P Lavti
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ohh.. I got it now!!!

For my testing I was creating the file in windows.. instead of creating at run time..

After creating it run time.. it worked!!!

Thanka ton!!!
 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Paul,

I am receiving the same error -> org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.

But the error is popping @ a specific line # 17850 which has the content -> <Name>TITLE II  COMMUNITY BENEFIT FUND</Name>

Can there be any wrong in the given text for UTF-8 conversion or that the document was not rightly encoded with UTF-8?

If the document was not created with UTF-8 why does it errors only at a particular line parsing after nerly 10000 lines of the doc?

Thanks,
John
 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Sorry for my stupid question. When i viewed through a Difference Viewer it showed me a special symbol that was actually present after the numerals (II & III).

I am attaching the screen shot. But I don't know why the special symbol looks like white space rather than showing up something weird.

Thank you!
John
diff2.JPG
[Thumbnail for diff2.JPG]
 
pardeephotmail kumar
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi John Jai,

You are not able to see the special character because the editor,terminal,gedit tools etc doesn't support that character.

Another way to handle the same exception what I found is

new String(bytes,"ISO-8859-1").getBytes("UTF-16");

If your contents are giving the SAXException: Invalid byte 2 of 2-byte UTF-8 sequence.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic