• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

SAXException: Invalid byte 2 of 2-byte UTF-8 sequence

 
P Lavti
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am trying to parse an XML file using SAX parser which contains Swedish character "��� ��� �".
I have used the UTF-8 encoding in the XML document.
<?xml version="1.0" encoding="UTF-8" ?>

But the parser gives me the exception:


If I use the encoding iso-8859-1, it works fine.
<?xml version="1.0" encoding="iso-8859-1" ?>

Can anybody help me in understanding why it doesn't work with UTF-8 encoding.
Is there any way I can parse my XML, using SAX parser with UTF-8 encoding.

Thanks!
[ January 25, 2008: Message edited by: P Lavti ]
 
Paul Clapham
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by P Lavti:
I have used the UTF-8 encoding in the XML document.
<?xml version="1.0" encoding="UTF-8" ?>

Just doing that does not cause the file to be encoded in UTF-8. You actually have to save it out of your text editor, or whatever is creating the file, in UTF-8. You didn't do that.
[ January 25, 2008: Message edited by: Paul Clapham ]
 
P Lavti
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The file is created at run time. How can I create a file in java which understand UTF-8 chars.

One more thing, If I change the encoding to iso-8859-1 it works fine.
 
Paul Clapham
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
 
P Lavti
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ohh.. I got it now!!!

For my testing I was creating the file in windows.. instead of creating at run time..

After creating it run time.. it worked!!!

Thanka ton!!!
 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Paul,

I am receiving the same error -> org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.

But the error is popping @ a specific line # 17850 which has the content -> <Name>TITLE II  COMMUNITY BENEFIT FUND</Name>

Can there be any wrong in the given text for UTF-8 conversion or that the document was not rightly encoded with UTF-8?

If the document was not created with UTF-8 why does it errors only at a particular line parsing after nerly 10000 lines of the doc?

Thanks,
John
 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Sorry for my stupid question. When i viewed through a Difference Viewer it showed me a special symbol that was actually present after the numerals (II & III).

I am attaching the screen shot. But I don't know why the special symbol looks like white space rather than showing up something weird.

Thank you!
John
diff2.JPG
[Thumbnail for diff2.JPG]
 
pardeephotmail kumar
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi John Jai,

You are not able to see the special character because the editor,terminal,gedit tools etc doesn't support that character.

Another way to handle the same exception what I found is

new String(bytes,"ISO-8859-1").getBytes("UTF-16");

If your contents are giving the SAXException: Invalid byte 2 of 2-byte UTF-8 sequence.
 
The moth suit and wings road is much more exciting than taxes. Or this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic