I'm a newbie in Struts dev and just now I'm faced with a strange problem related to accented letters. I read in a list of entries written in UTF-8 and fill a listbox with parsed entries on a jsp page. Unfortunatelly the accented letters doesn't appear properly as they was written in file (csv format, parsed before filling). The response is combined from a HtmlHead.jsp, xxx.jsp (let's say, a body) and a HtmlFoot.jsp. The HtmlHead.jsp contains <?xml version="1.0" encoding="utf-8"?> and in <head>: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
When I try to reach a copy of csv file outside of /WEB-INF through browser, it has the same problem with accented characters, but if it has been changed into html adding the <html>, <head> and <body> tags with modifications above, the letters are OK. Any ideas where I missed and what should to do to get proper accented letters in listbox? I get also a second problem regarding accented letters. Due to internationalisation I keep the texts appeared on page in a property file. The messages contains accented letters as well. In jsp pages I call them as, for example <s:text name="properties.something123"/> and it works well, EXCEPT the stuff in <head>: <head> .... <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <meta name="description" content="<s:text name="properties.description"/>"/> <meta name="keywords" content="<s:text name="properties.keywords"/>"/> <title><s:text name="properties.header"/></title> .... </head>
I can't give you a precise answer, since I don't know all the factors, though I will give you some stuff you should check:
- Try opening your file in an editor as UTF-8, to make sure it is really UTF-8. - Make sure you are reading the file as UTF-8. If you are using InputStreamReader then you should be specifying the encoding. Don't make any assumptions about what the OS defaults to. If your file is not UTF-8, then specify the character set that it is using in InputStreamReader. - Make sure that the content-type in the response header is "text/html; charset=utf-8".
Hi, I have faced and solved this problem and it is a common issue in case of internationalization. In this issue, the accented characters(like french character �) are not shown correctly on the web page. They look like empty boxes or special question mark characters. The basic problem exists in reading these characters from properties file. We need to provide proper encoding. While working for internationalization scenario, InputStreamReader must be initiated with 'InputStreamReader(java.io.InputStream in, java.nio.charset.Charset cs)' constructor to provide proper encoding/characterset to be used while reading from the files. We usually have perception that UTF is the solution, which is the default one but it is definitely not.For example for Western Europian Diacritics(languages) like french,dutch etc, 'ISO-8859-1' should be used and not UTF. If the encoding/characterset which we are using to read from resource bundle do not have 'Glymph' of that particular character in it, which it come across while reading, a distorted character is returned/disaplayed on screen. For further details about which encoding to be used for which language,have a look at following link: Character_encoding [ December 01, 2008: Message edited by: Arpit Purohit ]
posted 12 years ago
Arpit: UTF-8 can certainly be used for representing European characters, amongst many others. I would go as far as to say that standardising your web site on UTF-8 will solve many problems in the long run, though you do need to understand limitations posed by conversion to systems which are assuming a specific writing script.
To do things correctly you should ensure UTF-8 end to end, since if any part of the process assumes a more restricted character set such as ISO-8859-1 or GB2312, then you risk losing information on the conversion if the character is not supported.
Java uses UTF-16 iternally, so no matter what you are doing some sort of conversion has already taken place when communicating outside of the VM.
It should be noted that if your content type is text/html then ISO-8859-1 is what is meant to be the default handling. If you wish to use an alternative character set then you need to specify it. For example:
If somebody says you look familiar, tell them you are in porn. Or in these tiny ads: