aspose file tools*
The moose likes Tomcat and the fly likes Character Encoding issue with Tomcat 5.5.9 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Products » Tomcat
Bookmark "Character Encoding issue with Tomcat 5.5.9" Watch "Character Encoding issue with Tomcat 5.5.9" New topic
Author

Character Encoding issue with Tomcat 5.5.9

Jose Luis Huertas
Greenhorn

Joined: Nov 02, 2007
Posts: 6
Hi,
These are the steps that I followed to implement character encoding.

We are upgrading the application to support data entered in French.
The application is supposed to store and retrieve data entered in French.
French Alphabets � Even Before making any changes for character encoding , Except the chars � � and � ,everything was stored and retrieved properly
French AlphabetsA a <code>(� �, � �), (� �), B b, C c (� �), D d, E e (� �, � �, � �,
� �), F f, G g, H h, I i (� �, � �), J j, K k, L l, M m, N n
(� �), O o (� �), (� �), P p, Q q, S s, T t, ),
V v, W w, X x, (� �), Z </code>

When the control passes to servlet from the JSP, the chars � � and � are not preserved. They get passed as rectangles. On storing to the database they get stored as inverted question marks.

To handle this, I followed the following steps:

1.The following servlets are added.
<code>
import java.io.IOException;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;

public class UTF8EncodingFilter implements javax.servlet.Filter
{
public void init( FilterConfig filterConfig ) throws ServletException
{
// This would be a good place to collect a parameterized
// default encoding type. For brevity, we're going to
// use a hard-coded value in this example.
}
public void doFilter( ServletRequest request,
ServletResponse response,
FilterChain filterChain )
throws IOException, ServletException
{
// Wrap the response object. You should create a mechanism
// to ensure the response object only gets wrapped once.
// In this example, the response object will inappropriately
// get wrapped multiple times during a forward.
response = new UTF8EncodingServletResponse((HttpServletResponse)response );
// Specify the encoding to assume for the request so
// the parameters can be properly decoded/.
request.setCharacterEncoding( "UTF-8" );
response.setContentType("UTF-8");
System.out.println("UTF8EncodingFilter : doFilter() -> Both request &reponse are in UTF-8 Format.");
filterChain.doFilter( request, response );
}
public void destroy()
{
// no-op
}
}

------------------------------------------------------------------------------------------------------------

/*
* Created on Oct 30, 2007
*
* To change the template for this generated file go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/


import javax.servlet.http.HttpServletResponse;

public class UTF8EncodingServletResponse
extends javax.servlet.http.HttpServletResponseWrapper
{
private boolean encodingSpecified = false;
public UTF8EncodingServletResponse( HttpServletResponse response )
{
super( response );
}
public void setContentType( String type )
{
String explicitType = type;
// If a specific encoding has not already been set by the app,
// let's see if this is a call to specify it. If the content
// type doesn't explicitly set an encoding, make it UTF-8.
if (!encodingSpecified)
{
String lowerType = type.toLowerCase();
// See if this is a call to explicitly set the character encoding.
if (lowerType.indexOf( "charset" ) < 0)
{
// If no character encoding is specified, we still need to
// ensure the app is specifying text content.
if (lowerType.startsWith( "text/" ))
{
// App is sending a text response, but no encoding
// is specified, so we'll force it to UTF-8.
explicitType = type + "; charset=UTF-8";
}
}
else
{
// App picked a specific encoding, so let's make
// sure we don't override it.
encodingSpecified = true;
}
}
// Delegate to supertype to record encoding.
super.setContentType( explicitType );
}
}

</code>
WEB.XML changes
-----------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app id="WebApp">
<display-name>App Name</display-name>
<filter>
<filter-name>UTF8 Filter</filter-name>
<filter-class>com.vie.remoteDiagnostics.view.servlets.UTF8EncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>

<filter-mapping>
<filter-name>UTF8 Filter</filter-name>
<servlet-name>UploadServlet</servlet-name>
</filter-mapping>


-UploadServlet is the servlet to which the request is passed on submit of the input page.

-Instead of <servlet-name>UploadServlet</servlet-name> if <url-patter>/*</url-pattern> is used, then the chars are getting converted to format like this �? ��), F f, G g, H h, I i (�? ��, �? ��), J j, K k, L l, M m, N n
-(�? ��), O o (�? ��), (�? �?),

JSP page changes:
-----------------

<HEAD>
<META http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
(The above tag is added only on the page where the data is displayed. Not on the page where the user provides the input. if this include tag is provided on the input page, the entered characters are getting converted to same format as given above like �? ��), O o (�? ��), (�? �. )

Tomcat: 5.5.9 changes
---------------------

In Server.xml in conf directory

The URIEncoding="UTF-8" is added:

<Connector port="8080" maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8" />

<Connector port="8009" enableLookups="false" redirectPort="8443" uriencoding="UTF-8" protocol="AJP/1.3" />


Copied �Catalina.bat� and Catalina.xml files to the bin folder coz there where not there initially in Tomcat 5.5.9.

-Dfile.encoding=UTF-8 is added to the Catalina.bat as below

%_EXECJAVA% %JAVA_OPTS% %CATALINA_OPTS% %DEBUG_OPTS% -Dfile.encoding=UTF-8 -Djava.endorsed.dirs="%JAVA_ENDORSED_DIRS%" -classpath "%CLASSPATH%" -Dcatalina.base="%CATALINA_BASE%" -Dcatalina.home="%CATALINA_HOME%" -Djava.io.tmpdir="%CATALINA_TMPDIR%" %MAINCLASS% %CMD_LINE_ARGS% %ACTION%
goto end

So after making all these changes, still the characters � � and � are getting stored as inverted question marks.

When I searched in Tomcat bug list, it says that there is a patch (http://issues.apache.org/jira/browse/OFBIZ-281)that has to be applied for encoding in Tomcat 5.5.9. but the patch points to the CatalinaContainer.java class that is in the OFBiz framework (opensource framework ). I could'nt see this class in Tomcat 5.5.9 installation.

It would be really great if someone could throw some light on this.
Is there anything wrong with the steps followed? or
Is this really a Tomcat 5.5.9 issue.
Please help
Nhym bus
Greenhorn

Joined: Dec 13, 2007
Posts: 1
Hi !

I'm french,and i had the same problem with my own language ...
to get the � and � character working i used a filter

org.springframework.web.filter.CharacterEncodingFilter

The French encoding is ISO-8859-15 , to get this character working.

but, i had to implement my own filter to do this : ( http://java.sun.com/products/servlet/Filters.html )



to get the special character working on POST action.

Hope it will help !
[ December 13, 2007: Message edited by: NhyMbuS ]
Usha Seetharaman
Greenhorn

Joined: Apr 08, 2008
Posts: 15
Hi,

I am having problems with the following 4 characters:

� � � �. When these characters are passed as an input in the textbox and it goes to oracle its stored as S R x and a box character respectively. When it is retrieved also it is retrieved in the same format.

I have the filter already in place that sets the request type as UTF-8, what I do not have however is the response.setContentType.

My question here is will UTF-8 work or do I have to change the format as ISO-8859-15 for these characters alone. It would be a problem for me since other characters are working fine with UTF-8. Any help would be greatly appreciated.
Maki Jav
Ranch Hand

Joined: May 09, 2002
Posts: 436
The solution is to put

<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

in your form jsps....

and put attribute

URIEncoding="UTF-8"

in the <Connector port="8080" ....... /> tag of the server.xml file of tomcat.


Thanks


Maki Java

Help gets you when you need it!
 
jQuery in Action, 2nd edition
 
subject: Character Encoding issue with Tomcat 5.5.9