File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Handling languages other than English in Java ...........................   Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Handling languages other than English in Java ...........................   " Watch "Handling languages other than English in Java ...........................   " New topic
Author

Handling languages other than English in Java ...........................

Rama Vadakattu
Greenhorn

Joined: Oct 12, 2007
Posts: 6
hi,

I am reading several feeds which has titles/text in languages other than english like chinese, japan, Arabic etc..........

Example title in other language : 认清世界 读懂中国: 老百姓将杨佳案矛头指向沪政法书记吴志明

when i read such string in java program and display it iam getting all question marks instead of language specific characters.

as below : ???(?)

Can any one guide on how to resolve this language specific issue ?

--rama
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14688
    
  16

and display it

Where are you displaying it ? If you're displaying it in the console, you're environment might not support the fonts necessary to display such languages.


[My Blog]
All roads lead to JavaRanch
Rama Vadakattu
Greenhorn

Joined: Oct 12, 2007
Posts: 6
Instead of displaying iam storing those characters in a Mysql database still those language specific characters are appearing as question marks.
Rama Vadakattu
Greenhorn

Joined: Oct 12, 2007
Posts: 6
ANy clue on how to resolve this.
Matteo Di Furia
Ranch Hand

Joined: Jun 20, 2008
Posts: 102
Might be a problem of character sets configured on the database server.
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

...or it might be an issue with the capabilities of whatever client your use to view the data.

Have a read of this very good article.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18716
    
    8

And this one.
Rama Vadakattu
Greenhorn

Joined: Oct 12, 2007
Posts: 6
THanks all i have resolved the problem.

The below links explains you what is the problem and how to solve it clearly.
http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

Problem :

The characters which are in the feed are UTF-8 encoded characters ,
where as by default the tomcat server assume that all the characters are encoded in ISO-8859-1

as the result Tomcat is trying to read the characters in the feed (which are UTF-8 encoded) in ISO-8859-1 encoded format because of which it could not able to print the international character's.

How to resolve?
~~~~~~~~~~~~~~~
we need to say to the java servlet that those characters are UTF-8 and are not the default ISO encoded

How to say?
~~~~~~~~~~
URL ffeedurl = new URL(feedurl);
HttpURLConnection.setFollowRedirects(true);
URLConnection connection = ffeedurl.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection) connection;

Please observe the below line the second argument of InputStreamReader constructor ....................it is UTF-8
which say to the servlet that characters retrieved from the URL are UTF-8 Encoded are not encoded in the default ISO format

InputStreamReader defaultReader = new InputStreamReader(httpConnection.getInputStream(),"UTF-8");

That's it. in adddition to that you need to take care of the below things.

1) mysql connection should be as below
jdbc:mysql://localhost/databasename?useEncoding=true&characterEncoding=UTF-8
instead of
jdbc:mysql://localhost/databasename

2) in mysql database , each table , each text/varchar column should be of UTF-8-general-ci

3) if you are using log4j and want ot see the UTF-8 characters in the log messages you should add the below param to each appender
<param name="Encoding" value="UTF-8"/> (i don't know even after setting this i couldnot able to see the characters properly in log file/console)

4 ) important links which talks about this problem and solution:
http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ (to clearly undestand what the problem is and how to resolve)
http://blogs.warwick.ac.uk/kieranshaw/entry/utf-8_internationalisation_with
http://stackoverflow.com/questions/138948/how-to-get-utf-8-working-in-java-webapps

--rama
Gamini Sirisena
Ranch Hand

Joined: Aug 05, 2008
Posts: 357
Great! and thanks for posting the solution
 
wood burning stoves
 
subject: Handling languages other than English in Java ...........................