File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes JDBC and Relational Databases and the fly likes character encoding (unicode to utf-8) conversion problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "character encoding (unicode to utf-8) conversion problem" Watch "character encoding (unicode to utf-8) conversion problem" New topic

character encoding (unicode to utf-8) conversion problem

Yahya Elyasse
Ranch Hand

Joined: Jul 07, 2005
Posts: 510

I have run into a problem that I can't seem to find a solution to.

my users are copying and pasting from MS-Word. My DB is Oracle with its encoding set to "UTF-8".

Using Oracle's thin driver it automatically converts to the DB's default character set.

When Java tries to encode Unicode to UTF-8 and it runs into an unknown character (typically a character that is in the High Ascii range) it substitutes it with '?' or some other wierd character.

How do I prevent this.

I tried different encodings using a simple driver like:

But that didn't work. Then I tried a more elaborate conversion:

I tried a variation of the second code snippet that inserts into the DB - just to see the results and it was a no go.

I don't want '?' replacing the unknown chars. I would rather strip them or replace them with ' ' but I haven't been able to get that to work (using the second bit of code)

Any ideas on what I am doing wrong?

I agree. Here's the link:
subject: character encoding (unicode to utf-8) conversion problem
It's not a secret anymore!