| Author |
Handling languages other than English in Java ...........................
|
Rama Vadakattu
Greenhorn
Joined: Oct 12, 2007
Posts: 6
|
|
hi, I am reading several feeds which has titles/text in languages other than english like chinese, japan, Arabic etc.......... Example title in other language : 认清世界 读懂中国: 老百姓将杨佳案矛头指向沪政法书记吴志明 when i read such string in java program and display it iam getting all question marks instead of language specific characters. as below : ???(?) Can any one guide on how to resolve this language specific issue ? --rama
|
 |
Christophe Verré
Sheriff
Joined: Nov 24, 2005
Posts: 14669
|
|
and display it
Where are you displaying it ? If you're displaying it in the console, you're environment might not support the fonts necessary to display such languages.
|
[My Blog]
All roads lead to JavaRanch
|
 |
Rama Vadakattu
Greenhorn
Joined: Oct 12, 2007
Posts: 6
|
|
|
Instead of displaying iam storing those characters in a Mysql database still those language specific characters are appearing as question marks.
|
 |
Rama Vadakattu
Greenhorn
Joined: Oct 12, 2007
Posts: 6
|
|
|
ANy clue on how to resolve this.
|
 |
Matteo Di Furia
Ranch Hand
Joined: Jun 20, 2008
Posts: 102
|
|
|
Might be a problem of character sets configured on the database server.
|
 |
Paul Sturrock
Bartender
Joined: Apr 14, 2004
Posts: 10336
|
|
...or it might be an issue with the capabilities of whatever client your use to view the data. Have a read of this very good article.
|
JavaRanch FAQ HowToAskQuestionsOnJavaRanch
|
 |
Paul Clapham
Bartender
Joined: Oct 14, 2005
Posts: 16480
|
|
|
And this one.
|
 |
Rama Vadakattu
Greenhorn
Joined: Oct 12, 2007
Posts: 6
|
|
THanks all i have resolved the problem. The below links explains you what is the problem and how to solve it clearly. http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ Problem : The characters which are in the feed are UTF-8 encoded characters , where as by default the tomcat server assume that all the characters are encoded in ISO-8859-1 as the result Tomcat is trying to read the characters in the feed (which are UTF-8 encoded) in ISO-8859-1 encoded format because of which it could not able to print the international character's. How to resolve? ~~~~~~~~~~~~~~~ we need to say to the java servlet that those characters are UTF-8 and are not the default ISO encoded How to say? ~~~~~~~~~~ URL ffeedurl = new URL(feedurl); HttpURLConnection.setFollowRedirects(true); URLConnection connection = ffeedurl.openConnection(); HttpURLConnection httpConnection = (HttpURLConnection) connection; Please observe the below line the second argument of InputStreamReader constructor ....................it is UTF-8 which say to the servlet that characters retrieved from the URL are UTF-8 Encoded are not encoded in the default ISO format InputStreamReader defaultReader = new InputStreamReader(httpConnection.getInputStream(),"UTF-8"); That's it. in adddition to that you need to take care of the below things. 1) mysql connection should be as below jdbc:mysql://localhost/databasename?useEncoding=true&characterEncoding=UTF-8 instead of jdbc:mysql://localhost/databasename 2) in mysql database , each table , each text/varchar column should be of UTF-8-general-ci 3) if you are using log4j and want ot see the UTF-8 characters in the log messages you should add the below param to each appender <param name="Encoding" value="UTF-8"/> (i don't know even after setting this i couldnot able to see the characters properly in log file/console) 4 ) important links which talks about this problem and solution: http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ (to clearly undestand what the problem is and how to resolve) http://blogs.warwick.ac.uk/kieranshaw/entry/utf-8_internationalisation_with http://stackoverflow.com/questions/138948/how-to-get-utf-8-working-in-java-webapps --rama
|
 |
Gamini Sirisena
Ranch Hand
Joined: Aug 05, 2008
Posts: 347
|
|
Great! and thanks for posting the solution
|
 |
 |
|
|
subject: Handling languages other than English in Java ...........................
|
|
|