wood burning stoves 2.0*
The moose likes Java in General and the fly likes Confusion in Java encoding Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Confusion in Java encoding" Watch "Confusion in Java encoding" New topic

Confusion in Java encoding

Yan Zhou
Ranch Hand

Joined: Sep 02, 2003
Posts: 137

I have JDK 1.5 running on Windows XP (English version), and I have a PostgreSQL 7 database with encoding type being UTF-8. A java program takes in user input and store it into database.

If I type in Chinese (via Chinese input tool comes with Windows), the characters are stored into DB. Since Java String are encoded as UTF-16, there must be some conversion before the text can be written into a UTF-8 database, who is doing this, OS, JVM, or JDBC driver?

I added a little more functionality into the program, now that it uses JAXB (a xml parser and file generator) to convert the Chinese text into a XML element. I have used the default encoding type UTF-8 for the xml text. However, when i read the text from database, the XML parser fails because the text is not a valid UTF-8 text.

My workaround is to specify encoding type "GB2312" (for Chinese). Why do I have to that? Did not the program handle the text fine before without my sepcifying GB2312?

Yan Zhou
Ranch Hand

Joined: Sep 02, 2003
Posts: 137
In particular, I do not understand

1) how would database know these are Chinese text since I have not specified any Chinese encoding? I can see the Chinese text correctly when doing a "select * from" query.

2) why the database can handle UTF-8 fine, but the XML parser/generator cannot?

As far as I can see, the only difference is that my program calls
1) jdbc driver to store data into UTF-8


2) the XML generator to store data into UTF-8 and then store the generated text into database.

I agree. Here's the link: http://aspose.com/file-tools
subject: Confusion in Java encoding
Similar Threads
Any idea about encoding?
display/input/write Chinese Text in java Options. CP
XML -> SAX -> MYSQL conversion losing character encoding...
Internationalization (specifically with Chinese characters)
multiple language support in one XML