jQuery in Action, 2nd edition*
The moose likes Java in General and the fly likes Mapping Unicode to Ascii for web applications Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Mapping Unicode to Ascii for web applications" Watch "Mapping Unicode to Ascii for web applications" New topic
Author

Mapping Unicode to Ascii for web applications

Yunseuk Kim
Greenhorn

Joined: Aug 08, 2003
Posts: 8
Dear all,
I wrote a web-based wizard program to make faculty profiles (web pages) using JDOM + Java Servlet. I'm wondering how I can make non-ascii letters pass through the program and be stored as a XML file safely. There is no problem with ascii letters.
For example, there are double quotation letter, single quotation letter and other special characters, as non-ascii like >>>″<<< , that are from special symbols in MS word (insert -> symbols). When I enter them through a text box or a text area form, I cannot see them properly after submitting them once and the system transformed them with many '?'s.
If you know how to solve the problem, please tell me.
Thanks
Kim
[ August 28, 2003: Message edited by: Yunseuk Kim ]
[ August 28, 2003: Message edited by: Yunseuk Kim ]
[ September 01, 2003: Message edited by: Yunseuk Kim ]
Yunseuk Kim
Greenhorn

Joined: Aug 08, 2003
Posts: 8
While I tried to modify the upper posting, I noticed this message board program transformed '″' into '&#8243;'.
How do I change '″' into '&#8243;' while processing submitted strings through my Servlet program? Is any method or package to do this not manually?
How do I let the user see '″' letter instead of '&#8243;' when they edit or modify via web forms again after inserting (submitting as first time)?
...
Thanks,
Kim
[ August 28, 2003: Message edited by: Yunseuk Kim ]
Phil Chuang
Ranch Hand

Joined: Feb 15, 2003
Posts: 251
Doing so is pretty easy, infact....


Of course, this transforms ALL TEXT into character codes, but it's easily modified to just change certain characters.
I have this function for obscuring email address so spiders & bots can't pick them up off webpages
Yunseuk Kim
Greenhorn

Joined: Aug 08, 2003
Posts: 8
Thanks a lot Chuang!
Yunseuk Kim
Greenhorn

Joined: Aug 08, 2003
Posts: 8
I found another solution from http://www.i18nfaq.com/java.html.
[ September 01, 2003: Message edited by: Yunseuk Kim ]
Phil Chuang
Ranch Hand

Joined: Feb 15, 2003
Posts: 251
My first solution was a more general html-safe encoding - i'm not sure what it's encoding it to, but it's not 4-digit unicode like the example you posted. But if you're going to do it the 2nd way, don't use their example code, it's not well tuned - it generates xxx number of new strings, which is bad - it should be like the following, using a StringBuffer:
Yunseuk Kim
Greenhorn

Joined: Aug 08, 2003
Posts: 8
Good example! What you say is right. I already noticed the problem yesterday. And so I mixed your "general html-safe encoding" + the site's "example codes" + StringBuffer and had a solution to my original problem.
Thanks again Chuang.
[ September 02, 2003: Message edited by: Yunseuk Kim ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Mapping Unicode to Ascii for web applications
 
Similar Threads
Global Variables in a Java Web Application
Deploying webapp
int vs Strings
Problems with servlet-mapping to redirect some JSPs...
Regular Expression