aspose file tools*
The moose likes Java in General and the fly likes Convert string? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Convert string?" Watch "Convert string?" New topic
Author

Convert string?

Meyer Florian
Ranch Hand

Joined: Oct 20, 2003
Posts: 62
Hello

Using the Character class, I am able to change letters to upper case and determine if they are letters, digits or special characters.

Unfortunately, �, �, �, �, �, �, � and same uppercase characters (and many more) are recognized as letters. Is there a way in java to convert � to A, � to E and so on?

M�ller must result in MULLER and not MLLER or MUELLER...

Thanks for any help!
Florian
Grant Gainey
Ranch Hand

Joined: Oct 16, 2005
Posts: 65
Originally posted by Meyer Florian:

Unfortunately, �, �, �, �, �, �, � and same uppercase characters (and many more) are recognized as letters. Is there a way in java to convert � to A, � to E and so on?

M�ller must result in MULLER and not MLLER or MUELLER...

Ummm...why would you want to do that? If someone's name is M�ller, they're going to be unhappy if it's shown as MULLER - that's not their name. All those characters up there aren't just "aeiou with funny marks" - they're different characters, just as if they were x's and z's.

The only reason I can think of for doing what you're attempting is to store the names as 7-bit ASCII, which is a really US-centric view of data.

At any rate - assuming you're really stuck on this path, the only thing I can think of would be to have a mapping table somewhere of "Weird non-US characters that Those Durn Furriners shouldn't be using" to "The Five Vowels The Computer Gods Intended".

But be prepared for your users to complain bitterly about you changing their names...

Good luck,
Grant


In Theory, there is no difference between theory and practice.<br />In Practice, there is no relationship between theory and practice.
Meyer Florian
Ranch Hand

Joined: Oct 20, 2003
Posts: 62
The names will not be changed. In our databases, the names will be stored as "M�ller" and - for internal search and sort purposes - also stored as "MULLER". We can't change this situation because there's a legacy system that must still work with these names.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18121
    
    8

The "decomposition" part of this Unicode report should get you started.
Grant Gainey
Ranch Hand

Joined: Oct 16, 2005
Posts: 65
Originally posted by Paul Clapham:
The "decomposition" part of this Unicode report should get you started.

Now that is very cool - will need to read in detail tonight.

Meyer - ahh, I understand the requirement now. Apologies if I sounded snippy - I've seen too many systems implemented where the designer was trying to "get rid of all these stupid marks", because they had no concept of anything other than ASCII.

Grant
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
You may also be interested in java.text.Collator and related classes (like java.text.CollationKey). I've never really gotten around to using them, but they're apparently designed with this sort of thing in mind.
[ April 21, 2006: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
Originally posted by Jim Yingst:
You may also be interested in java.text.Collator and related classes (like java.text.CollationKey). I've never really gotten around to using them, but they're apparently designed with this sort of thing in mind.

You can tell a Collator to ignore accents when sorting, but it sounds like the OP needs to strip the accents so he can feed the names to a legacy system. The CollationElementIterator class could be of some use, but it would still leave you a lot of hand coding to do (I know this because I've just spent several hours fighting with it myself). I think you're better off doing as Grant said and writing up your own mapping table. If you're just converting accented letters to their unaccented equivalents, a simple switch block would do it.

But what if you receive the name as "Mueller"? Are you supposed to drop the 'e'?
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I agree. It's unfortunate that there's no getCanonicalForm() or getSimplifiedForm() on Collator or CollationKey, to return the simplest string that's considered equivalent by a given Collator. Seems like they have all the necessary tables and info buried within the class, but decline to expose it in a form that would be useful to legacy systems. Hmff. At least Collator may be useful for testing. Not that it would necessarily be more correct than a hand-customized table, but comparing the results of a Collator-based sort with other techniques could well be useful in identifying anomalies that might otherwise be difficult to detect.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Convert string?
 
Similar Threads
CycleChars
Problem with JSP app when calling Yahoo Geocoding API.
putting a name in an array
reading digits from a double
WA #1.....word association