I need to get a word document from a harddrive and insert the document to a blob column in an Oracle database. Once the doc is in the DB, the users can open it using winword. To achieve the gaol, based ont the framework i must use, we retrieve the blob, write it to the disk and then open it with winword vua the runtime.exec() feature.
Upload and download of the document is OK (same size). This works fine with simple text document but it doesn't work with complex word docuement.
After comparing the original and produced (by opening both document in notepad) we noticed that some characters are different ... for example the euro symbol in the original file is replaced by a questionmark (?) in the one produced from the blob.
Based on this, I suspect a problem regarding the charset used. Locale.setDefault(Locale.UK) is set at the beginning of the applciation.
Do you think using a charset decoder/encoder can help ? ANy suggestion welcome.
In fact, we work in a strong typed environment. All the data is passed from a client to the server thru "bastypes" (typed objects). The only way to get the data contained in it is to get some kind of representation wich is a string.
What i do is : gets the string from the basetype gets the bytes from that string insert a record in the db using the empty_blob() function "select ... for update" the created record get the blob object from the resultset create a byteinputstream with the byte from the string get an outputstream on the blob push the bytes in the blob (intream/outstream) close the streams commit
those base types are available on the server and on the client. On the client, if I get the data in a basetype (open file) and then write the content of the bastype on disk under a different name, both files are strictly the same.
Regarding the database, the db is accessed by an application written in PB too and the import of documents in the blob column works fine.
I suspect the JDBC layer to use some king of locale or charset defined somewhere and to make conversion on the string part not on he bytes ...
Joined: Aug 09, 2004
some new info ...
I identified the bytes wich are diff�rent ... Only 75 bad for a file size of 52220 bytes ! Only 5 different bytes values identified in these 75 bytes.(-112, -115, -127, -113, -99) .
In each case, those bytes are replaced by the 63 value !