my dog learned polymorphism*
The moose likes Oracle/OAS and the fly likes Inserting Japanese text into Oracle database Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Products » Oracle/OAS
Bookmark "Inserting Japanese text into Oracle database" Watch "Inserting Japanese text into Oracle database" New topic
Author

Inserting Japanese text into Oracle database

Kirtikumar Puthran
Ranch Hand

Joined: Mar 04, 2003
Posts: 37
Hi,
I apologise if this is not the right forum to ask this question.
I am trying to insert Japanese data retrieved from a JSP page having
"shift-jis" charset into an UTF-8 oracle database.
I am using the following code to get the user input from the Form text box
field as follows:
String textboxString = request.getParameter("japaneseText");
String original = new
String(textboxString.getBytes("8859_1"));
byte[] JISBytes = original.getBytes("Shift_JIS");
String insertToDB = new String(JISBytes, "UTF8");
The string is then inserted in a UTF-8 database
But when the string is retrieved from the database using Java or Perl code,
the result is seen as junk "Mojibake" characters.
Note : When the data is inserted using a Perl script, the data is inserted
properly and the results are also retrieved properly in Japanese characters.
I would appreciate if someone could throw some pointers on this issue. (Even tried this using a servlet )


Regards,<br />Kirti
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17257
    
    6

Good tough question, and yes it does not belong in this forum.
I am going to move this to the Oracle forum.
Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Varun Khanna
Ranch Hand

Joined: May 30, 2002
Posts: 1400
It should work the way you are doing.
Can you tell me once you have insterted data in Oracle, how are you insuring that data is correct?
And what do u mean by **junk "Mojibake" characters** .. are they appearing like some memory address?
I too faced this problem earlier this year ... see my reply in this post to see that I went to solve it. But offcourse, my database for Sybase.
I don't see any issue with Oracle as it supports UTF-8 stuff. Something going wrong in middleware it seems.
[ January 20, 2004: Message edited by: Varun Khanna ]

- Varun
Kirtikumar Puthran
Ranch Hand

Joined: Mar 04, 2003
Posts: 37
First of all, sorry for the delay in reply....was trying to find a way out of this problem.

It should work the way you are doing.

---- yes, you are right Varun...it should have worked this way....luckily for me, it worked with a little tweak, though the charset of my database was UTF8...i think oci8 driver played the culprit over here...so i changed the way to connect to the database to include the charset (this was suggested to me by http://www.i18nfaq.com/java.html )....my previous code to connect to DB was :
String dsnFeedBack = "jdbc racle ci8:@hostString";
conn = DriverManager.getConnection(dsnFeedBack,"user","pwd");
i changed the above to :
String dsnFeedBack = "jdbc racle ci8:@hostString";
Properties props = new Properties();
props.setProperty("user", "user");
props.setProperty("password", "pwd");
props.setProperty("charset", "utf8");
conn = DriverManager.getConnection(dsnFeedBack, props);

This way it worked fine !!!

Can you tell me once you have insterted data in Oracle, how are you insuring that data is correct?

---- Now, with my previous code the data in the database was looking like "???" and inverted question marks......
With the above mentioned way, the values stored in the database are "Entity Reference Values" or "Numeric Character Reference (NCR)" for every character (something like 盙 read as &_#_xxxxx;_ ) ...this way when i retrieve the values from the database, it displays correctly on the webpage (JSP).


And what do u mean by **junk "Mojibake" characters** .. are they appearing like some memory address?

----"Moji Bake" means "Junk/Garbage characters".......In Japanese, "Moji" means "characters" and "Bake" means "to get funny or to get crazy"....so previously, when i used to retrieve the values from the database and display on the browser, it would appear as though they are Japanese characters...but they are not as some are converted appropriately and some characters are junk....so the whole word/sentence doesn't make any sense at all...hope this is clear to explain what "Mojibake" is
I too faced this problem earlier this year ... see my reply in this post to see that I went to solve it. But offcourse, my database for Sybase.

--- You are lucky that you could solve it....because the above way seemed to work for me only when there i have a self-submitting JSP i.e the input is taken from the user and the form is submiitted to the same page where the input (Japanese values) are stored in the database, retrieved from the database and shown back to the user.
When i tried to include this page within the another JSP, the same problem occured...though i had properly set the characterset of the JSP (in the META tag)....i am still trying to figure out a way which would work in any scenario....


I don't see any issue with Oracle as it supports UTF-8 stuff. Something going wrong in middleware it seems.

Let me know if you any suggestions on this matter.
Rgds,
Kirti
[ February 09, 2004: Message edited by: Kirtikumar Puthran ]
[ February 09, 2004: Message edited by: Kirtikumar Puthran ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Inserting Japanese text into Oracle database