Two Laptop Bag
The moose likes Java in General and the fly likes Handling languages other than English in Java ...........................   Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Handling languages other than English in Java ...........................   " Watch "Handling languages other than English in Java ...........................   " New topic

Handling languages other than English in Java ...........................

Rama Vadakattu

Joined: Oct 12, 2007
Posts: 6

I am reading several feeds which has titles/text in languages other than english like chinese, japan, Arabic etc..........

Example title in other language : 认清世界 读懂中国: 老百姓将杨佳案矛头指向沪政法书记吴志明

when i read such string in java program and display it iam getting all question marks instead of language specific characters.

as below : ???(?)

Can any one guide on how to resolve this language specific issue ?

Christophe Verré

Joined: Nov 24, 2005
Posts: 14688

and display it

Where are you displaying it ? If you're displaying it in the console, you're environment might not support the fonts necessary to display such languages.

[My Blog]
All roads lead to JavaRanch
Rama Vadakattu

Joined: Oct 12, 2007
Posts: 6
Instead of displaying iam storing those characters in a Mysql database still those language specific characters are appearing as question marks.
Rama Vadakattu

Joined: Oct 12, 2007
Posts: 6
ANy clue on how to resolve this.
Matteo Di Furia
Ranch Hand

Joined: Jun 20, 2008
Posts: 102
Might be a problem of character sets configured on the database server.
Paul Sturrock

Joined: Apr 14, 2004
Posts: 10336

...or it might be an issue with the capabilities of whatever client your use to view the data.

Have a read of this very good article.

JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Paul Clapham

Joined: Oct 14, 2005
Posts: 19760

And this one.
Rama Vadakattu

Joined: Oct 12, 2007
Posts: 6
THanks all i have resolved the problem.

The below links explains you what is the problem and how to solve it clearly.

Problem :

The characters which are in the feed are UTF-8 encoded characters ,
where as by default the tomcat server assume that all the characters are encoded in ISO-8859-1

as the result Tomcat is trying to read the characters in the feed (which are UTF-8 encoded) in ISO-8859-1 encoded format because of which it could not able to print the international character's.

How to resolve?
we need to say to the java servlet that those characters are UTF-8 and are not the default ISO encoded

How to say?
URL ffeedurl = new URL(feedurl);
URLConnection connection = ffeedurl.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection) connection;

Please observe the below line the second argument of InputStreamReader constructor is UTF-8
which say to the servlet that characters retrieved from the URL are UTF-8 Encoded are not encoded in the default ISO format

InputStreamReader defaultReader = new InputStreamReader(httpConnection.getInputStream(),"UTF-8");

That's it. in adddition to that you need to take care of the below things.

1) mysql connection should be as below
instead of

2) in mysql database , each table , each text/varchar column should be of UTF-8-general-ci

3) if you are using log4j and want ot see the UTF-8 characters in the log messages you should add the below param to each appender
<param name="Encoding" value="UTF-8"/> (i don't know even after setting this i couldnot able to see the characters properly in log file/console)

4 ) important links which talks about this problem and solution: (to clearly undestand what the problem is and how to resolve)

Gamini Sirisena
Ranch Hand

Joined: Aug 05, 2008
Posts: 378
Great! and thanks for posting the solution
I agree. Here's the link:
subject: Handling languages other than English in Java ...........................
jQuery in Action, 3rd edition