aspose file tools*
The moose likes Struts and the fly likes Some (not all!) UTF8 Characters render as Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Frameworks » Struts
Bookmark "Some (not all!) UTF8 Characters render as "?" only from UNIX server -- works perfectly local (Win).." Watch "Some (not all!) UTF8 Characters render as "?" only from UNIX server -- works perfectly local (Win).." New topic
Author

Some (not all!) UTF8 Characters render as "?" only from UNIX server -- works perfectly local (Win)..

Dan Cane
Greenhorn

Joined: Nov 23, 2009
Posts: 4
I'm beating my head against the wall here, so I thought I'd ping you to see if you had some insight..

I didn't see a natural area to post this in, so let me know if I should move it somewhere else. My app is struts based, so....

I've localized my application - it's designed to run as UTF-8. I have bundles for 5 languages -- and the application works perfectly on my local windows box, but when I run it off the UNIX server, *some* of the UTF8 characters don't render (they render as ? marks). I'm positive that the character encoding is set correctly, and the browser is correctly detecting the encoding as UTF8. Again, this works 100% on my local box, but not on the server.

I've been eliminating variables as best I can, and now I'm down to something at the OS level? It also doesn't add up that SOME UTF8 encoded text works (Spanish and French letters, which are encoded as UTF8) -- but Asian languages (Traditional Chinese and Japanese) do not...

Completely at a loss.

It's (probably) not a tomcat setting, since I'm running the same tomcat locally and it works... Grrrr... Could the property files be reading in differently on UNIX vs. Windows? Java .property files are supposed to be ISO8895-1 encoded - and when I look at it on the UNIX file-system it looks OK...

Any ideas?

If you have an insight I'd appreciate it. I'm happy to provide access to the page I'm working on, just didn't want it posted in a public place (yet )

Thanks in advance!

Dan

- edit -

A little more information, in case it helps:

1) My dev environment: eclipse 3.5 on windows vista 64
2) All of the property files are using the /uxxxx format to encode the text. I know it's supposed to be an ISO8859-1 file with these /uxxxx characters.

A sample of the bundles:

myBundle_en.properties:
address.city=City
address.country=Country
address.state=State

myBundle_es.properties:
address.city=Ciudad
address.country=Pa\u00eds <--- THIS WORKS EVEN ON UNIX. Odd....
address.state=Estado

myBundle_ja.properties:
address.city=\u753a <--- Renders as ?
address.country=\u56fd <--- Renders as ?
address.state=\u72b6\u614b <--- Renders as ??


Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18657
    
    8

When something renders as ?, that means that it went through an encoding which couldn't encode it.

Normally there's two steps between a properties file and what you are looking at: (1) read the data into memory from the properties file which as you know uses ISO-8859-1 plus some special handling, and (2) ... well, you didn't say how you are looking at that data.

You seem to have dealt with (1) correctly. So the problem would most likely be with (2) ... whatever it is.
Dan Cane
Greenhorn

Joined: Nov 23, 2009
Posts: 4
Paul Clapham wrote:When something renders as ?, that means that it went through an encoding which couldn't encode it.

Normally there's two steps between a properties file and what you are looking at: (1) read the data into memory from the properties file which as you know uses ISO-8859-1 plus some special handling, and (2) ... well, you didn't say how you are looking at that data.

You seem to have dealt with (1) correctly. So the problem would most likely be with (2) ... whatever it is.


Paul,

Ahh - you might need that detail :)

I'm looking at it in a browser. I've checked IE8 and FF - and the page encoded is being picked up correctly as UTF8. I'm super stumped as the "lower" UTF characters work (e.g. a Spanish "n"), but anything "higher" doesn't.... Why it works wehn served from Windows, but not UNIX -- a clue - but as to what I have no idea...

The URL to take a peek at this is pbd.modernizingderm.com/xmd/ (just plain ol http, i just didnt want spiders crawling it yet)

As you can see, latin chars OK - UTF8 = check, but asian chars are a no-go... :(

Dan
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18657
    
    8

And presumably there's some code which writes that data to the output stream of a request? That's where I would be looking. You might also find some enlightenment from reading Character Conversions from Browser to Database, even though there isn't a database involved and the data is going to the browser, not from it.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18657
    
    8

Dan Cane wrote:The URL to take a peek at this is pbd.modernizingderm.com/xmd/

And when I look at the headers sent to Firefox, I see things like this:
Content-Type: text/html;charset=ISO-8859-1

This was when I clicked on the "French" link at the bottom. When I click on the "Japanese" link I see this:
Content-Type: text/html

So there's more to this content-type business than you thought. You might try to find out why this happens.
Dan Cane
Greenhorn

Joined: Nov 23, 2009
Posts: 4
Paul Clapham wrote:
And when I look at the headers sent to Firefox, I see things like this:
Content-Type: text/html;charset=ISO-8859-1



Where are you able to see that? I'm looking at firebug's request of the strut and here is what I have:

Response Headers
Date Tue, 24 Nov 2009 13:37:23 GMT
Content-Type text/html
Content-Language zh
Vary Accept-Encoding
Content-Encoding gzip
Content-Length 2183
Keep-Alive timeout=15, max=100
Connection Keep-Alive

Request Headers
Host pbd.modernizingderm.com
User-Agent Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 300
Connection keep-alive
Referer http://pbd.modernizingderm.com/xmd/app/FirmHome.action?locale=en
Cookie JSESSIONID=028CC2436C8702324EA9E90DE70B8588
Cache-Control max-age=0

I see the -Accept=Charset" having both ISO and utf-8 (what my local browser can support) , but the content-type only has text/html... I think you've found the issue, but I'm trying to figure out how to reproduce that (so i can fix it. :) )

FWIW: All pages have the meta-tag: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
and my taglib for the "header" (used on all pages) contains <%@ page contentType="text/html; charset=UTF-8" language="java" %>

Thanks!
Dan Cane
Greenhorn

Joined: Nov 23, 2009
Posts: 4
Solved!

PEBAKC

(Problem Exists Between Keyboard and Chair)

Well sorta.. The server I was publishing on had #%@ SET JAVA_OPTS -Djavax.servlet.request.encoding="ISO-8859-1" in the shell script, changing the default to ISO in places that I didn't "force" UTF8.... I looked everywhere in the environment, but didnt think to look at the tomcat startup script...

Problem solved!!!

The site looks cool in Chinese

Thanks for all of your help!

Dan
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Some (not all!) UTF8 Characters render as "?" only from UNIX server -- works perfectly local (Win)..