File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JSP and the fly likes Internationalization (specifically with Chinese characters) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » JSP
Bookmark "Internationalization (specifically with Chinese characters)" Watch "Internationalization (specifically with Chinese characters)" New topic
Author

Internationalization (specifically with Chinese characters)

Min Huang
Greenhorn

Joined: Sep 21, 2004
Posts: 16
I have a weird problem I can't quite figure out. I am also not even sure if I'm posting in the correct forum, so please move my post if necessary.

Basically I am trying to get an HTML post to work with a chinese character.

I have a form that is multipart/form-data encoded:



The form has a single input in which I am trying to send the chinese character 我. For some reason, I am not getting the right character back when I break and inspect my command object in Eclipse.

I've tried a few things:

a) I don't explicitly set a page directive. When I submit, I get this back as a string: 我

This strikes me as incorrect because it should be a unicode character along the lines of '\uxxxx'

b) I set the content type via a page directive:



This gets me a little bit closer; when I break, I see three characters: ���

I understand that in the encoding, some characters can have variable length (1-4), so I'm not surprised if 我 requires such an encoding. However, I should expect one character instead of three.

Does anybody know how to resolve this issue?


SCJP, SCJD, SCBCD, SCWCD
Scott Tiger
Greenhorn

Joined: Mar 23, 2007
Posts: 7
^_^ you should use this:
<%@ page contentType="text/html; charset=gb2312" %>

PS:You are chinese? hehe.
Min Huang
Greenhorn

Joined: Sep 21, 2004
Posts: 16
Setting the charset to gb2312 doesn't work; I don't see the character I want. I've crafted a JSP that illustrates my problem:



Specifically, I have:
1) <%@page pageEncoding="UTF-8"%> set.
2) <%@page contentType="text/html;charset=UTF-8"%> set after the first directive.
3) <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/> in the head.
4) enctype="multipart/form-data" attribute in the form.
5) accept-charset="UTF-8" attribute in the form.

The results I see are:
For a GET: �ˆ‘ is the result.
For a POST with enctype="application/x-www-form-urlencoded": �ˆ‘ is the result.
For any other POST encoding: No entry in request parameter map.

Btw, yes I am Chinese.
Min Huang
Greenhorn

Joined: Sep 21, 2004
Posts: 16
I figured it out. You have to put <% request.setCharacterEncoding("UTF-8"); %> at the top of the JSP. The character encoding has to be done before any params are read, or else it wont work.

Having a scriplet in your JSP is no good, so you can make a servlet filter to do just that. Make sure it's the first filter in the chain or it might not work. Order matters.

You can use CharacterEncodingFilter in Spring, or write your own:



You can replace the page directives with this in your web.xml:
Neha Deshmukh
Ranch Hand

Joined: Apr 04, 2007
Posts: 30
Hi,
Use character encoding 'Unicode' instead of 'UTF-8' and see your problem will get solved.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Internationalization (specifically with Chinese characters)