jQuery in Action, 3rd edition
The moose likes JDBC and Relational Databases and the fly likes character encoding (unicode to utf-8) conversion problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "character encoding (unicode to utf-8) conversion problem" Watch "character encoding (unicode to utf-8) conversion problem" New topic

character encoding (unicode to utf-8) conversion problem

Yahya Elyasse
Ranch Hand

Joined: Jul 07, 2005
Posts: 510

I have run into a problem that I can't seem to find a solution to.

my users are copying and pasting from MS-Word. My DB is Oracle with its encoding set to "UTF-8".

Using Oracle's thin driver it automatically converts to the DB's default character set.

When Java tries to encode Unicode to UTF-8 and it runs into an unknown character (typically a character that is in the High Ascii range) it substitutes it with '?' or some other wierd character.

How do I prevent this.

I tried different encodings using a simple driver like:

But that didn't work. Then I tried a more elaborate conversion:

I tried a variation of the second code snippet that inserts into the DB - just to see the results and it was a no go.

I don't want '?' replacing the unknown chars. I would rather strip them or replace them with ' ' but I haven't been able to get that to work (using the second bit of code)

Any ideas on what I am doing wrong?

I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
subject: character encoding (unicode to utf-8) conversion problem
It's not a secret anymore!