• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

character encoding (unicode to utf-8) conversion problem

 
Yahya Elyasse
Ranch Hand
Posts: 510
Eclipse IDE Google Web Toolkit Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have run into a problem that I can't seem to find a solution to.

my users are copying and pasting from MS-Word. My DB is Oracle with its encoding set to "UTF-8".

Using Oracle's thin driver it automatically converts to the DB's default character set.

When Java tries to encode Unicode to UTF-8 and it runs into an unknown character (typically a character that is in the High Ascii range) it substitutes it with '?' or some other wierd character.

How do I prevent this.

I tried different encodings using a simple driver like:

But that didn't work. Then I tried a more elaborate conversion:


I tried a variation of the second code snippet that inserts into the DB - just to see the results and it was a no go.

I don't want '?' replacing the unknown chars. I would rather strip them or replace them with ' ' but I haven't been able to get that to work (using the second bit of code)

Any ideas on what I am doing wrong?

Thanks,
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic