File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Unicode and MSSQL

 
Sacha Beaulieu
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all !

I'm trying to store chinese and arabic characters into a MSSQL 2005 DB and it's not working.

Here is the flow of my test application :

Post chinese in textarea -> get this text with a servlet -> insert the text into DB -> select the text from DB -> display the text from the textarea directly (aka request.getParameter("theText")) and also from the select (with a resulset etc.)

The text from the textarea still displays correctly, but the text from the DB doesn't work, I get a bunch of ???.

- I'm using the mssql JDBC driver.
- I'm using a "ntext" field.

Any idea ?
 
Paul Clapham
Sheriff
Pie
Posts: 20202
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you read Character Conversions from Browser to Database?
 
Sacha Beaulieu
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yep.

In my JSP (the one that post the chinese characters) I've got :



In my servlet (where the text is outputed) I've got :


before retreiving the value of the field from the JSP.

And I also have :


But I'm able to display the chinese text that I just posted (with a request.getParameter(), it's when I try to print the text directly from the DB that it doesn't work. So I guess the problem is probably witht he Insert or with the Select in the DB.

Everything is UTF-8...


Here is a bit of code from my Servlet in the doPost method :


I get the following results :
- the string textFromDb1 displays as question marks in the browser
- the string "orgText" displays correctly in the browser

My connection string is :
jdbc:sqlserver://127.0.01:1433;databaseName=DEV_TEST;characterEncoding=UTF-8

My driver is :
com.microsoft.sqlserver.jdbc.SQLServerDriver

Is there any settings in SQL Server to say that it should use UTF-8 ? Maybe it's not really UTF-8.. I don't know..
[ March 02, 2007: Message edited by: Sacha Beaulieu ]
 
Paul Clapham
Sheriff
Pie
Posts: 20202
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A good set of tests. So it's the database end of things that is the problem and not the browser end.

I would use a PreparedStatement and not that Statement to insert data, but that's just a general principle, I doubt it will fix your problem. Although when I googled NTEXT (to find out what it was) I came across this post on another forum that suggested it would.

And Microsoft's documentation about NTEXT says that its data is stored in UCS-2, so the server's charset should be irrelevant.
 
Sacha Beaulieu
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I could just see if the problem is after the Insert or with the Select, it would help me. But the SQL 2005 client doesn't even show the unicode characters correctly.

Example :

I just try :
SELECT '<chinese characters here>' as TEST
and the query analyzer still return question marks.

 
Sacha Beaulieu
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem seems to be with the driver... I tried using a parameter like "charset=utf8" in the connection string but it doesn't seem to make any difference... Anybody successfully saved Unicode with MSSQL ?
 
Sacha Beaulieu
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I finally succeeded.. The problem was with the INSERT. Here is the post that helped me :
http://forum.java.sun.com/thread.jspa?threadID=502015&messageID=2384513

In my INSERT statment I did the following :
INSERT INTO <table> (text) values (N'<text here>)
Notice the "N" prefix... It seems that only at Install time, we can change the default charset. But using the N prefix seems to "force it" in unicode.

All the parameters in the connection string had no influence.. such as :
characterEncoding
Encoding
sendStringAsUnicode=true

Thank you for your help Paul !
[ March 05, 2007: Message edited by: Sacha Beaulieu ]
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic