• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

multiple language support in one XML

 
Ranch Hand
Posts: 137
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

is it possible to support multiple languages in one XML, e.g., Japanese and Chinese text in the same XML? If so, how would I specify encoding type in that XML, simply UTF-8?

The reason of this question is because I run into issues in dealing with Chinese text in my Java program. I use latest JAXB as XML parser, and store the text in a UNICODE (UTF-8) PostgreSQL database (latest version).

As I type in Chinese text, JAXB has no problem marshalling my text into a XML string with encoding type set to UTF-8, and my code successfully saves the XML text into the database; but when reading out, the JAXB Unmarshaller gives error: invalid byte 2 of 3 byte UTF-8 character, on the XML string I just read from DB.

The first question is, if both my XML and DB specify encoding type being UTF-8, why am I still having problem parsing the XML text?

Someone mentioned that I have to tell the parser the character set I used, which is "GB2312". Just because JAXB supports UTF-8, does not mean it knows how to convert Chinese text into UTF-8. Once I changed the encoding to GB2312, the program worked, reading out XML text had no problem.

However, my question continues, if I need to support both Japanese and Chinese text in the same XML, how do I specify the encoding type since now I have two different encoding. Do I have to convert my text into UTF-8 myself and set XML encoding as UTF-8?

Another question is, what is the relationship between UTF-8 and all the character sets (GB2312, Big5, etc.)

Since a XML file must be in one of the languages, therefore, a XML file must use one of the character sets, and in turn, the encoding attribute in XML must be the character set, NOT "UTF-8" (since the parser does not know how to convert characters into UTF-8 without knowing the character set in use). If so, when would we ever use "UTF-8" in our XML for encoding?

Thanks.
Yan
 
Yan Zhou
Ranch Hand
Posts: 137
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
another issue I do not understand is, if UTF-8 should not be used when I am inputting Chinese text (use GB2312 instead), why JAXB does not report error when marshalling the text, only does so when unmarshalling them?

Thanks.
Yan
 
reply
    Bookmark Topic Watch Topic
  • New Topic