I have JDK 1.5 running on Windows XP (English version), and I have a PostgreSQL 7 database with encoding type being UTF-8. A java program takes in user input and store it into database.
If I type in Chinese (via Chinese input tool comes with Windows), the characters are stored into DB. Since Java String are encoded as UTF-16, there must be some conversion before the text can be written into a UTF-8 database, who is doing this, OS, JVM, or JDBC driver?
I added a little more functionality into the program, now that it uses JAXB (a xml parser and file generator) to convert the Chinese text into a XML element. I have used the default encoding type UTF-8 for the xml text. However, when i read the text from database, the XML parser fails because the text is not a valid UTF-8 text.
My workaround is to specify encoding type "GB2312" (for Chinese). Why do I have to that? Did not the program handle the text fine before without my sepcifying GB2312?
posted 10 years ago
In particular, I do not understand
1) how would database know these are Chinese text since I have not specified any Chinese encoding? I can see the Chinese text correctly when doing a "select * from" query.
2) why the database can handle UTF-8 fine, but the XML parser/generator cannot?
As far as I can see, the only difference is that my program calls 1) jdbc driver to store data into UTF-8
2) the XML generator to store data into UTF-8 and then store the generated text into database.