File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes JDBC and Relational Databases and the fly likes conversion problem (charset ?) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "conversion problem (charset ?)" Watch "conversion problem (charset ?)" New topic

conversion problem (charset ?)

pascal monfils

Joined: Aug 09, 2004
Posts: 9
I need to get a word document from a harddrive and insert the document to a blob column in an Oracle database.
Once the doc is in the DB, the users can open it using winword. To achieve the gaol, based ont the framework i must use, we retrieve the blob, write it to the disk and then open it with winword vua the runtime.exec() feature.

Upload and download of the document is OK (same size).
This works fine with simple text document but it doesn't work with complex word docuement.

After comparing the original and produced (by opening both document in notepad) we noticed that some characters are different ... for example the euro symbol in the original file is replaced by a questionmark (?) in the one produced from the blob.

Based on this, I suspect a problem regarding the charset used.
Locale.setDefault(Locale.UK) is set at the beginning of the applciation.

Do you think using a charset decoder/encoder can help ?
ANy suggestion welcome.

Thanks in advance for your help.

Stefan Wagner
Ranch Hand

Joined: Jun 02, 2003
Posts: 1923

I didn't work with BLOBs till now, but from the name - binary large object - I would expect a database to save bytes as they are, and not to translate something.

How do you save and restore it?
Jeanne Boyarsky
author & internet detective

Joined: May 26, 2003
Posts: 33125

We had this problem with CLOBs. The wrong encoding was set on the db server.

I'm surprised you are getting it with BLOBs as that is just data.

[OCA 8 book] [Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Other Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, TOGAF part 1 and part 2
pascal monfils

Joined: Aug 09, 2004
Posts: 9
In fact, we work in a strong typed environment.
All the data is passed from a client to the server thru "bastypes" (typed objects).
The only way to get the data contained in it is to get some kind of representation wich is a string.

What i do is :
gets the string from the basetype
gets the bytes from that string
insert a record in the db using the empty_blob() function
"select ... for update" the created record
get the blob object from the resultset
create a byteinputstream with the byte[] from the string
get an outputstream on the blob
push the bytes in the blob (intream/outstream)
close the streams

those base types are available on the server and on the client.
On the client, if I get the data in a basetype (open file) and then write the content of the bastype on disk under a different name, both files are strictly the same.

Regarding the database, the db is accessed by an application written in PB too and the import of documents in the blob column works fine.

I suspect the JDBC layer to use some king of locale or charset defined somewhere and to make conversion on the string part not on he bytes ...
pascal monfils

Joined: Aug 09, 2004
Posts: 9
some new info ...

I identified the bytes wich are diff�rent ...
Only 75 bad for a file size of 52220 bytes !
Only 5 different bytes values identified in these 75 bytes.(-112, -115, -127, -113, -99) .

In each case, those bytes are replaced by the 63 value !

Really doesn't understand what happens !!!

Help still greatly needed
I agree. Here's the link:
subject: conversion problem (charset ?)
jQuery in Action, 3rd edition