This week's book giveaway is in the Mac OS forum.
We're giving away four copies of a choice of "Take Control of Upgrading to Yosemite" or "Take Control of Automating Your Mac" and have Joe Kissell on-line!
See this thread for details.
The moose likes Struts and the fly likes UTF-8 problem again, but still little diffrent Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Frameworks » Struts
Bookmark "UTF-8 problem again, but still little diffrent" Watch "UTF-8 problem again, but still little diffrent" New topic
Author

UTF-8 problem again, but still little diffrent

Misi Nyilas
Greenhorn

Joined: Nov 18, 2008
Posts: 3
Hello for all,

I'm a newbie in Struts dev and just now I'm faced with a strange problem related to accented letters.
I read in a list of entries written in UTF-8 and fill a listbox with parsed entries on a jsp page.
Unfortunatelly the accented letters doesn't appear properly as they was written in file (csv format, parsed before filling).
The response is combined from a HtmlHead.jsp, xxx.jsp (let's say, a body) and a HtmlFoot.jsp. The HtmlHead.jsp contains
<?xml version="1.0" encoding="utf-8"?>
and in <head>:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

When I try to reach a copy of csv file outside of /WEB-INF through browser, it has the same problem with accented characters, but if it has been changed into html adding the <html>, <head> and <body> tags with modifications above, the letters are OK.
Any ideas where I missed and what should to do to get proper accented letters in listbox?
I get also a second problem regarding accented letters. Due to internationalisation I keep the texts appeared on page in a property file. The messages contains accented letters as well. In jsp pages I call them as, for example
<s:text name="properties.something123"/>
and it works well, EXCEPT the stuff in <head>:
<head>
....
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta name="description" content="<s:text name="properties.description"/>"/>
<meta name="keywords" content="<s:text name="properties.keywords"/>"/>
<title><s:text name="properties.header"/></title>
....
</head>

Any idea what went wrong?

Thanks in advance, krnl
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

"No Matter", please check your private messages for an important administrative matter.
Misi Nyilas
Greenhorn

Joined: Nov 18, 2008
Posts: 3
Nobody have any idea?
André-John Mas
Ranch Hand

Joined: Oct 18, 2008
Posts: 37
I can't give you a precise answer, since I don't know all the factors, though I will give you some stuff you should check:

- Try opening your file in an editor as UTF-8, to make sure it is really UTF-8.
- Make sure you are reading the file as UTF-8. If you are using InputStreamReader then you should be specifying the encoding. Don't make any assumptions about what the OS defaults to. If your file is not UTF-8, then specify the character set that it is using in InputStreamReader.
- Make sure that the content-type in the response header is "text/html; charset=utf-8".

It should be noted that Java uses UTF-16 internally. Additionally if you are using Tomcat you should be adding a Charset filter, to handle POSTs receiving UTF-8 data: http://wiki.apache.org/tomcat/Tomcat/UTF-8
Arpit Purohit
Greenhorn

Joined: Jan 09, 2007
Posts: 21
Hi,
I have faced and solved this problem and it is a common issue in case of internationalization.
In this issue, the accented characters(like french character ) are not shown correctly on the web page. They look like empty boxes or special question mark characters.
The basic problem exists in reading these characters from properties file. We need to provide proper encoding. While working for internationalization scenario, InputStreamReader must be initiated with 'InputStreamReader(java.io.InputStream in, java.nio.charset.Charset cs)' constructor to provide proper encoding/characterset to be used while reading from the files.
We usually have perception that UTF is the solution, which is the default one but it is definitely not.For example for Western Europian Diacritics(languages) like french,dutch etc, 'ISO-8859-1' should be used and not UTF.
If the encoding/characterset which we are using to read from resource bundle do not have 'Glymph' of that particular character in it, which it come across while reading, a distorted character is returned/disaplayed on screen.
For further details about which encoding to be used for which language,have a look at following link:
Character_encoding
[ December 01, 2008: Message edited by: Arpit Purohit ]

Regards,
Arpit Purohit
André-John Mas
Ranch Hand

Joined: Oct 18, 2008
Posts: 37
Arpit: UTF-8 can certainly be used for representing European characters, amongst many others. I would go as far as to say that standardising your web site on UTF-8 will solve many problems in the long run, though you do need to understand limitations posed by conversion to systems which are assuming a specific writing script.

To do things correctly you should ensure UTF-8 end to end, since if any part of the process assumes a more restricted character set such as ISO-8859-1 or GB2312, then you risk losing information on the conversion if the character is not supported.

Java uses UTF-16 iternally, so no matter what you are doing some sort of conversion has already taken place when communicating outside of the VM.

It should be noted that if your content type is text/html then ISO-8859-1 is what is meant to be the default handling. If you wish to use an alternative character set then you need to specify it. For example:

text/html; charset=UTF-8
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: UTF-8 problem again, but still little diffrent