aspose file tools*
The moose likes XML and Related Technologies and the fly likes SAXException: Invalid byte 2 of 2-byte UTF-8 sequence Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "SAXException: Invalid byte 2 of 2-byte UTF-8 sequence" Watch "SAXException: Invalid byte 2 of 2-byte UTF-8 sequence" New topic
Author

SAXException: Invalid byte 2 of 2-byte UTF-8 sequence

P Lavti
Ranch Hand

Joined: Mar 27, 2007
Posts: 65
Hi,

I am trying to parse an XML file using SAX parser which contains Swedish character "��� ��� �".
I have used the UTF-8 encoding in the XML document.
<?xml version="1.0" encoding="UTF-8" ?>

But the parser gives me the exception:


If I use the encoding iso-8859-1, it works fine.
<?xml version="1.0" encoding="iso-8859-1" ?>

Can anybody help me in understanding why it doesn't work with UTF-8 encoding.
Is there any way I can parse my XML, using SAX parser with UTF-8 encoding.

Thanks!
[ January 25, 2008: Message edited by: P Lavti ]

-P Lavti<br />SCJP 5.0 (88%)
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

Originally posted by P Lavti:
I have used the UTF-8 encoding in the XML document.
<?xml version="1.0" encoding="UTF-8" ?>
Just doing that does not cause the file to be encoded in UTF-8. You actually have to save it out of your text editor, or whatever is creating the file, in UTF-8. You didn't do that.
[ January 25, 2008: Message edited by: Paul Clapham ]
P Lavti
Ranch Hand

Joined: Mar 27, 2007
Posts: 65
The file is created at run time. How can I create a file in java which understand UTF-8 chars.

One more thing, If I change the encoding to iso-8859-1 it works fine.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

P Lavti
Ranch Hand

Joined: Mar 27, 2007
Posts: 65
Ohh.. I got it now!!!

For my testing I was creating the file in windows.. instead of creating at run time..

After creating it run time.. it worked!!!

Thanka ton!!!
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
Hi Paul,

I am receiving the same error -> org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.

But the error is popping @ a specific line # 17850 which has the content -> <Name>TITLE II  COMMUNITY BENEFIT FUND</Name>

Can there be any wrong in the given text for UTF-8 conversion or that the document was not rightly encoded with UTF-8?

If the document was not created with UTF-8 why does it errors only at a particular line parsing after nerly 10000 lines of the doc?

Thanks,
John
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
Hi,

Sorry for my stupid question. When i viewed through a Difference Viewer it showed me a special symbol that was actually present after the numerals (II & III).

I am attaching the screen shot. But I don't know why the special symbol looks like white space rather than showing up something weird.

Thank you!
John


[Thumbnail for diff2.JPG]

pardeephotmail kumar
Greenhorn

Joined: Apr 16, 2013
Posts: 2
hi John Jai,

You are not able to see the special character because the editor,terminal,gedit tools etc doesn't support that character.

Another way to handle the same exception what I found is

new String(bytes,"ISO-8859-1").getBytes("UTF-16");

If your contents are giving the SAXException: Invalid byte 2 of 2-byte UTF-8 sequence.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: SAXException: Invalid byte 2 of 2-byte UTF-8 sequence