File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Handling languages other than English in Java ...........................   Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Handling languages other than English in Java ...........................   " Watch "Handling languages other than English in Java ...........................   " New topic

Handling languages other than English in Java ...........................

Rama Vadakattu

Joined: Oct 12, 2007
Posts: 6

I am reading several feeds which has titles/text in languages other than english like chinese, japan, Arabic etc..........

Example title in other language : 认清世界 读懂中国: 老百姓将杨佳案矛头指向沪政法书记吴志明

when i read such string in java program and display it iam getting all question marks instead of language specific characters.

as below : ???(?)

Can any one guide on how to resolve this language specific issue ?

Christophe Verré

Joined: Nov 24, 2005
Posts: 14688

and display it

Where are you displaying it ? If you're displaying it in the console, you're environment might not support the fonts necessary to display such languages.

[My Blog]
All roads lead to JavaRanch
Rama Vadakattu

Joined: Oct 12, 2007
Posts: 6
Instead of displaying iam storing those characters in a Mysql database still those language specific characters are appearing as question marks.
Rama Vadakattu

Joined: Oct 12, 2007
Posts: 6
ANy clue on how to resolve this.
Matteo Di Furia
Ranch Hand

Joined: Jun 20, 2008
Posts: 102
Might be a problem of character sets configured on the database server.
Paul Sturrock

Joined: Apr 14, 2004
Posts: 10336

...or it might be an issue with the capabilities of whatever client your use to view the data.

Have a read of this very good article.

JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

And this one.
Rama Vadakattu

Joined: Oct 12, 2007
Posts: 6
THanks all i have resolved the problem.

The below links explains you what is the problem and how to solve it clearly.

Problem :

The characters which are in the feed are UTF-8 encoded characters ,
where as by default the tomcat server assume that all the characters are encoded in ISO-8859-1

as the result Tomcat is trying to read the characters in the feed (which are UTF-8 encoded) in ISO-8859-1 encoded format because of which it could not able to print the international character's.

How to resolve?
we need to say to the java servlet that those characters are UTF-8 and are not the default ISO encoded

How to say?
URL ffeedurl = new URL(feedurl);
URLConnection connection = ffeedurl.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection) connection;

Please observe the below line the second argument of InputStreamReader constructor is UTF-8
which say to the servlet that characters retrieved from the URL are UTF-8 Encoded are not encoded in the default ISO format

InputStreamReader defaultReader = new InputStreamReader(httpConnection.getInputStream(),"UTF-8");

That's it. in adddition to that you need to take care of the below things.

1) mysql connection should be as below
instead of

2) in mysql database , each table , each text/varchar column should be of UTF-8-general-ci

3) if you are using log4j and want ot see the UTF-8 characters in the log messages you should add the below param to each appender
<param name="Encoding" value="UTF-8"/> (i don't know even after setting this i couldnot able to see the characters properly in log file/console)

4 ) important links which talks about this problem and solution: (to clearly undestand what the problem is and how to resolve)

Gamini Sirisena
Ranch Hand

Joined: Aug 05, 2008
Posts: 378
Great! and thanks for posting the solution
I agree. Here's the link:
subject: Handling languages other than English in Java ...........................
jQuery in Action, 3rd edition