my dog learned polymorphism*
The moose likes JDBC and the fly likes character encoding (unicode to utf-8) conversion problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "character encoding (unicode to utf-8) conversion problem" Watch "character encoding (unicode to utf-8) conversion problem" New topic
Author

character encoding (unicode to utf-8) conversion problem

Yahya Elyasse
Ranch Hand

Joined: Jul 07, 2005
Posts: 510

I have run into a problem that I can't seem to find a solution to.

my users are copying and pasting from MS-Word. My DB is Oracle with its encoding set to "UTF-8".

Using Oracle's thin driver it automatically converts to the DB's default character set.

When Java tries to encode Unicode to UTF-8 and it runs into an unknown character (typically a character that is in the High Ascii range) it substitutes it with '?' or some other wierd character.

How do I prevent this.

I tried different encodings using a simple driver like:

But that didn't work. Then I tried a more elaborate conversion:


I tried a variation of the second code snippet that inserts into the DB - just to see the results and it was a no go.

I don't want '?' replacing the unknown chars. I would rather strip them or replace them with ' ' but I haven't been able to get that to work (using the second bit of code)

Any ideas on what I am doing wrong?

Thanks,
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: character encoding (unicode to utf-8) conversion problem