File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Handling languages other than English in Java ...........................

 
Rama Vadakattu
Greenhorn
Posts: 6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,

I am reading several feeds which has titles/text in languages other than english like chinese, japan, Arabic etc..........

Example title in other language : 认清世界 读懂中国: 老百姓将杨佳案矛头指向沪政法书记吴志明

when i read such string in java program and display it iam getting all question marks instead of language specific characters.

as below : ???(?)

Can any one guide on how to resolve this language specific issue ?

--rama
 
Christophe Verré
Sheriff
Pie
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
and display it

Where are you displaying it ? If you're displaying it in the console, you're environment might not support the fonts necessary to display such languages.
 
Rama Vadakattu
Greenhorn
Posts: 6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Instead of displaying iam storing those characters in a Mysql database still those language specific characters are appearing as question marks.
 
Rama Vadakattu
Greenhorn
Posts: 6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ANy clue on how to resolve this.
 
Matteo Di Furia
Ranch Hand
Posts: 102
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Might be a problem of character sets configured on the database server.
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
...or it might be an issue with the capabilities of whatever client your use to view the data.

Have a read of this very good article.
 
Paul Clapham
Sheriff
Pie
Posts: 20181
25
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And this one.
 
Rama Vadakattu
Greenhorn
Posts: 6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
THanks all i have resolved the problem.

The below links explains you what is the problem and how to solve it clearly.
http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

Problem :

The characters which are in the feed are UTF-8 encoded characters ,
where as by default the tomcat server assume that all the characters are encoded in ISO-8859-1

as the result Tomcat is trying to read the characters in the feed (which are UTF-8 encoded) in ISO-8859-1 encoded format because of which it could not able to print the international character's.

How to resolve?
~~~~~~~~~~~~~~~
we need to say to the java servlet that those characters are UTF-8 and are not the default ISO encoded

How to say?
~~~~~~~~~~
URL ffeedurl = new URL(feedurl);
HttpURLConnection.setFollowRedirects(true);
URLConnection connection = ffeedurl.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection) connection;

Please observe the below line the second argument of InputStreamReader constructor ....................it is UTF-8
which say to the servlet that characters retrieved from the URL are UTF-8 Encoded are not encoded in the default ISO format

InputStreamReader defaultReader = new InputStreamReader(httpConnection.getInputStream(),"UTF-8");

That's it. in adddition to that you need to take care of the below things.

1) mysql connection should be as below
jdbc:mysql://localhost/databasename?useEncoding=true&characterEncoding=UTF-8
instead of
jdbc:mysql://localhost/databasename

2) in mysql database , each table , each text/varchar column should be of UTF-8-general-ci

3) if you are using log4j and want ot see the UTF-8 characters in the log messages you should add the below param to each appender
<param name="Encoding" value="UTF-8"/> (i don't know even after setting this i couldnot able to see the characters properly in log file/console)

4 ) important links which talks about this problem and solution:
http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ (to clearly undestand what the problem is and how to resolve)
http://blogs.warwick.ac.uk/kieranshaw/entry/utf-8_internationalisation_with
http://stackoverflow.com/questions/138948/how-to-get-utf-8-working-in-java-webapps

--rama
 
Gamini Sirisena
Ranch Hand
Posts: 378
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Great! and thanks for posting the solution
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic