Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Mapping Unicode to Ascii for web applications

 
Yunseuk Kim
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear all,
I wrote a web-based wizard program to make faculty profiles (web pages) using JDOM + Java Servlet. I'm wondering how I can make non-ascii letters pass through the program and be stored as a XML file safely. There is no problem with ascii letters.
For example, there are double quotation letter, single quotation letter and other special characters, as non-ascii like >>>″<<< , that are from special symbols in MS word (insert -> symbols). When I enter them through a text box or a text area form, I cannot see them properly after submitting them once and the system transformed them with many '?'s.
If you know how to solve the problem, please tell me.
Thanks
Kim
[ August 28, 2003: Message edited by: Yunseuk Kim ]
[ August 28, 2003: Message edited by: Yunseuk Kim ]
[ September 01, 2003: Message edited by: Yunseuk Kim ]
 
Yunseuk Kim
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
While I tried to modify the upper posting, I noticed this message board program transformed '″' into '&#8243;'.
How do I change '″' into '&#8243;' while processing submitted strings through my Servlet program? Is any method or package to do this not manually?
How do I let the user see '″' letter instead of '&#8243;' when they edit or modify via web forms again after inserting (submitting as first time)?
...
Thanks,
Kim
[ August 28, 2003: Message edited by: Yunseuk Kim ]
 
Phil Chuang
Ranch Hand
Posts: 251
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Doing so is pretty easy, infact....


Of course, this transforms ALL TEXT into character codes, but it's easily modified to just change certain characters.
I have this function for obscuring email address so spiders & bots can't pick them up off webpages
 
Yunseuk Kim
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks a lot Chuang!
 
Yunseuk Kim
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I found another solution from http://www.i18nfaq.com/java.html.
[ September 01, 2003: Message edited by: Yunseuk Kim ]
 
Phil Chuang
Ranch Hand
Posts: 251
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My first solution was a more general html-safe encoding - i'm not sure what it's encoding it to, but it's not 4-digit unicode like the example you posted. But if you're going to do it the 2nd way, don't use their example code, it's not well tuned - it generates xxx number of new strings, which is bad - it should be like the following, using a StringBuffer:
 
Yunseuk Kim
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good example! What you say is right. I already noticed the problem yesterday. And so I mixed your "general html-safe encoding" + the site's "example codes" + StringBuffer and had a solution to my original problem.
Thanks again Chuang.
[ September 02, 2003: Message edited by: Yunseuk Kim ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic