Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Inserting Japanese text into Oracle database

 
Kirtikumar Puthran
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I apologise if this is not the right forum to ask this question.
I am trying to insert Japanese data retrieved from a JSP page having
"shift-jis" charset into an UTF-8 oracle database.
I am using the following code to get the user input from the Form text box
field as follows:
String textboxString = request.getParameter("japaneseText");
String original = new
String(textboxString.getBytes("8859_1"));
byte[] JISBytes = original.getBytes("Shift_JIS");
String insertToDB = new String(JISBytes, "UTF8");
The string is then inserted in a UTF-8 database
But when the string is retrieved from the database using Java or Perl code,
the result is seen as junk "Mojibake" characters.
Note : When the data is inserted using a Perl script, the data is inserted
properly and the results are also retrieved properly in Japanese characters.
I would appreciate if someone could throw some pointers on this issue. (Even tried this using a servlet )
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good tough question, and yes it does not belong in this forum.
I am going to move this to the Oracle forum.
Mark
 
Varun Khanna
Ranch Hand
Posts: 1400
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It should work the way you are doing.
Can you tell me once you have insterted data in Oracle, how are you insuring that data is correct?
And what do u mean by **junk "Mojibake" characters** .. are they appearing like some memory address?
I too faced this problem earlier this year ... see my reply in this post to see that I went to solve it. But offcourse, my database for Sybase.
I don't see any issue with Oracle as it supports UTF-8 stuff. Something going wrong in middleware it seems.
[ January 20, 2004: Message edited by: Varun Khanna ]
 
Kirtikumar Puthran
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First of all, sorry for the delay in reply....was trying to find a way out of this problem.

It should work the way you are doing.

---- yes, you are right Varun...it should have worked this way....luckily for me, it worked with a little tweak, though the charset of my database was UTF8...i think oci8 driver played the culprit over here...so i changed the way to connect to the database to include the charset (this was suggested to me by http://www.i18nfaq.com/java.html )....my previous code to connect to DB was :
String dsnFeedBack = "jdbc racle ci8:@hostString";
conn = DriverManager.getConnection(dsnFeedBack,"user","pwd");
i changed the above to :
String dsnFeedBack = "jdbc racle ci8:@hostString";
Properties props = new Properties();
props.setProperty("user", "user");
props.setProperty("password", "pwd");
props.setProperty("charset", "utf8");
conn = DriverManager.getConnection(dsnFeedBack, props);

This way it worked fine !!!

Can you tell me once you have insterted data in Oracle, how are you insuring that data is correct?

---- Now, with my previous code the data in the database was looking like "???" and inverted question marks......
With the above mentioned way, the values stored in the database are "Entity Reference Values" or "Numeric Character Reference (NCR)" for every character (something like 盙 read as &_#_xxxxx;_ ) ...this way when i retrieve the values from the database, it displays correctly on the webpage (JSP).


And what do u mean by **junk "Mojibake" characters** .. are they appearing like some memory address?

----"Moji Bake" means "Junk/Garbage characters".......In Japanese, "Moji" means "characters" and "Bake" means "to get funny or to get crazy"....so previously, when i used to retrieve the values from the database and display on the browser, it would appear as though they are Japanese characters...but they are not as some are converted appropriately and some characters are junk....so the whole word/sentence doesn't make any sense at all...hope this is clear to explain what "Mojibake" is
I too faced this problem earlier this year ... see my reply in this post to see that I went to solve it. But offcourse, my database for Sybase.

--- You are lucky that you could solve it....because the above way seemed to work for me only when there i have a self-submitting JSP i.e the input is taken from the user and the form is submiitted to the same page where the input (Japanese values) are stored in the database, retrieved from the database and shown back to the user.
When i tried to include this page within the another JSP, the same problem occured...though i had properly set the characterset of the JSP (in the META tag)....i am still trying to figure out a way which would work in any scenario....


I don't see any issue with Oracle as it supports UTF-8 stuff. Something going wrong in middleware it seems.

Let me know if you any suggestions on this matter.
Rgds,
Kirti
[ February 09, 2004: Message edited by: Kirtikumar Puthran ]
[ February 09, 2004: Message edited by: Kirtikumar Puthran ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic