| Author |
Mapping Unicode to Ascii for web applications
|
Yunseuk Kim
Greenhorn
Joined: Aug 08, 2003
Posts: 8
|
|
Dear all, I wrote a web-based wizard program to make faculty profiles (web pages) using JDOM + Java Servlet. I'm wondering how I can make non-ascii letters pass through the program and be stored as a XML file safely. There is no problem with ascii letters. For example, there are double quotation letter, single quotation letter and other special characters, as non-ascii like >>>″<<< , that are from special symbols in MS word (insert -> symbols). When I enter them through a text box or a text area form, I cannot see them properly after submitting them once and the system transformed them with many '?'s. If you know how to solve the problem, please tell me. Thanks Kim [ August 28, 2003: Message edited by: Yunseuk Kim ] [ August 28, 2003: Message edited by: Yunseuk Kim ] [ September 01, 2003: Message edited by: Yunseuk Kim ]
|
 |
Yunseuk Kim
Greenhorn
Joined: Aug 08, 2003
Posts: 8
|
|
While I tried to modify the upper posting, I noticed this message board program transformed '″' into '″'. How do I change '″' into '″' while processing submitted strings through my Servlet program? Is any method or package to do this not manually? How do I let the user see '″' letter instead of '″' when they edit or modify via web forms again after inserting (submitting as first time)? ... Thanks, Kim [ August 28, 2003: Message edited by: Yunseuk Kim ]
|
 |
Phil Chuang
Ranch Hand
Joined: Feb 15, 2003
Posts: 251
|
|
Doing so is pretty easy, infact.... Of course, this transforms ALL TEXT into character codes, but it's easily modified to just change certain characters. I have this function for obscuring email address so spiders & bots can't pick them up off webpages
|
 |
Yunseuk Kim
Greenhorn
Joined: Aug 08, 2003
Posts: 8
|
|
Thanks a lot Chuang!
|
 |
Yunseuk Kim
Greenhorn
Joined: Aug 08, 2003
Posts: 8
|
|
I found another solution from http://www.i18nfaq.com/java.html. [ September 01, 2003: Message edited by: Yunseuk Kim ]
|
 |
Phil Chuang
Ranch Hand
Joined: Feb 15, 2003
Posts: 251
|
|
My first solution was a more general html-safe encoding - i'm not sure what it's encoding it to, but it's not 4-digit unicode like the example you posted. But if you're going to do it the 2nd way, don't use their example code, it's not well tuned - it generates xxx number of new strings, which is bad - it should be like the following, using a StringBuffer:
|
 |
Yunseuk Kim
Greenhorn
Joined: Aug 08, 2003
Posts: 8
|
|
Good example! What you say is right. I already noticed the problem yesterday. And so I mixed your "general html-safe encoding" + the site's "example codes" + StringBuffer and had a solution to my original problem. Thanks again Chuang. [ September 02, 2003: Message edited by: Yunseuk Kim ]
|
 |
 |
|
|
subject: Mapping Unicode to Ascii for web applications
|
|
|