• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Confusion in Java encoding

 
Yan Zhou
Ranch Hand
Posts: 137
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have JDK 1.5 running on Windows XP (English version), and I have a PostgreSQL 7 database with encoding type being UTF-8. A java program takes in user input and store it into database.

If I type in Chinese (via Chinese input tool comes with Windows), the characters are stored into DB. Since Java String are encoded as UTF-16, there must be some conversion before the text can be written into a UTF-8 database, who is doing this, OS, JVM, or JDBC driver?

I added a little more functionality into the program, now that it uses JAXB (a xml parser and file generator) to convert the Chinese text into a XML element. I have used the default encoding type UTF-8 for the xml text. However, when i read the text from database, the XML parser fails because the text is not a valid UTF-8 text.

My workaround is to specify encoding type "GB2312" (for Chinese). Why do I have to that? Did not the program handle the text fine before without my sepcifying GB2312?

Thanks.
Yan
 
Yan Zhou
Ranch Hand
Posts: 137
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In particular, I do not understand

1) how would database know these are Chinese text since I have not specified any Chinese encoding? I can see the Chinese text correctly when doing a "select * from" query.

2) why the database can handle UTF-8 fine, but the XML parser/generator cannot?

As far as I can see, the only difference is that my program calls
1) jdbc driver to store data into UTF-8

or

2) the XML generator to store data into UTF-8 and then store the generated text into database.

Thanks.
Yan
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic