File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Confusion in Java encoding Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Confusion in Java encoding" Watch "Confusion in Java encoding" New topic

Confusion in Java encoding

Yan Zhou
Ranch Hand

Joined: Sep 02, 2003
Posts: 137

I have JDK 1.5 running on Windows XP (English version), and I have a PostgreSQL 7 database with encoding type being UTF-8. A java program takes in user input and store it into database.

If I type in Chinese (via Chinese input tool comes with Windows), the characters are stored into DB. Since Java String are encoded as UTF-16, there must be some conversion before the text can be written into a UTF-8 database, who is doing this, OS, JVM, or JDBC driver?

I added a little more functionality into the program, now that it uses JAXB (a xml parser and file generator) to convert the Chinese text into a XML element. I have used the default encoding type UTF-8 for the xml text. However, when i read the text from database, the XML parser fails because the text is not a valid UTF-8 text.

My workaround is to specify encoding type "GB2312" (for Chinese). Why do I have to that? Did not the program handle the text fine before without my sepcifying GB2312?

Yan Zhou
Ranch Hand

Joined: Sep 02, 2003
Posts: 137
In particular, I do not understand

1) how would database know these are Chinese text since I have not specified any Chinese encoding? I can see the Chinese text correctly when doing a "select * from" query.

2) why the database can handle UTF-8 fine, but the XML parser/generator cannot?

As far as I can see, the only difference is that my program calls
1) jdbc driver to store data into UTF-8


2) the XML generator to store data into UTF-8 and then store the generated text into database.

jQuery in Action, 2nd edition
subject: Confusion in Java encoding
Similar Threads
Any idea about encoding?
display/input/write Chinese Text in java Options. CP
XML -> SAX -> MYSQL conversion losing character encoding...
Internationalization (specifically with Chinese characters)
multiple language support in one XML