aspose file tools*
The moose likes Java in General and the fly likes Apply Unicode escapes? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Apply Unicode escapes?" Watch "Apply Unicode escapes?" New topic
Author

Apply Unicode escapes?

Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
Hi,

Is there an easy way to apply Unicode escapes to a text string, so that each \u#### is replaced by the equivalent Unicode character?

Basically, I need to replicate behaviour of Properties files, without actually using Properties.load().


Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
No-one replied, and my own further researches suggest there probably isn't such an API method. So I wrote my own. I don't think my employer will mind this posting here, for the edification/scrutiny of Ranchers...



Something like that, anyway. Testing may reveal shortcomings.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
One thing to beware here is that it's possible to have a double-backslash escape. So

\u####

is a Unicode escape, but

\\u####

is not. Or even

\\\\\\\\\u####

is a Unicode escape, but

\\\\\\\\\\u####

is not.


"I'm not back." - Bill Harding, Twister
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
Yes, that's true, and if I was writing a method for a totally-general application, I'd have to deal with that. In my situation, I am pretty sure that \u followed by 4 hex digits will only appear in the string if an escape code is intended.

The most common situation where problems occur with this is where the text being processed is actually an explanation of Unicode escapes! I can be sure my text won't be that.

As it is fairly easy to do, I could perhaps beef-up my regex so that it says not to match, if the text being matched is preceded by another backslash. That's still not perfect, as your loads-of-backslashes examples showed, but it would be a step in the right direction.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Apply Unicode escapes?