• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Internationalization (specifically with Chinese characters)

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a weird problem I can't quite figure out. I am also not even sure if I'm posting in the correct forum, so please move my post if necessary.

Basically I am trying to get an HTML post to work with a chinese character.

I have a form that is multipart/form-data encoded:



The form has a single input in which I am trying to send the chinese character 我. For some reason, I am not getting the right character back when I break and inspect my command object in Eclipse.

I've tried a few things:

a) I don't explicitly set a page directive. When I submit, I get this back as a string: 我

This strikes me as incorrect because it should be a unicode character along the lines of '\uxxxx'

b) I set the content type via a page directive:



This gets me a little bit closer; when I break, I see three characters: ���

I understand that in the encoding, some characters can have variable length (1-4), so I'm not surprised if 我 requires such an encoding. However, I should expect one character instead of three.

Does anybody know how to resolve this issue?
 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
^_^ you should use this:
<%@ page contentType="text/html; charset=gb2312" %>

PS:You are chinese? hehe.
 
Min Huang
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Setting the charset to gb2312 doesn't work; I don't see the character I want. I've crafted a JSP that illustrates my problem:



Specifically, I have:
1) <%@page pageEncoding="UTF-8"%> set.
2) <%@page contentType="text/html;charset=UTF-8"%> set after the first directive.
3) <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/> in the head.
4) enctype="multipart/form-data" attribute in the form.
5) accept-charset="UTF-8" attribute in the form.

The results I see are:
For a GET: �ˆ‘ is the result.
For a POST with enctype="application/x-www-form-urlencoded": �ˆ‘ is the result.
For any other POST encoding: No entry in request parameter map.

Btw, yes I am Chinese.
 
Min Huang
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I figured it out. You have to put <% request.setCharacterEncoding("UTF-8"); %> at the top of the JSP. The character encoding has to be done before any params are read, or else it wont work.

Having a scriplet in your JSP is no good, so you can make a servlet filter to do just that. Make sure it's the first filter in the chain or it might not work. Order matters.

You can use CharacterEncodingFilter in Spring, or write your own:



You can replace the page directives with this in your web.xml:
 
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
Use character encoding 'Unicode' instead of 'UTF-8' and see your problem will get solved.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic