Unfortunately, �, �, �, �, �, �, � and same uppercase characters (and many more) are recognized as letters. Is there a way in java to convert � to A, � to E and so on?
M�ller must result in MULLER and not MLLER or MUELLER...
Ummm...why would you want to do that? If someone's name is M�ller, they're going to be unhappy if it's shown as MULLER - that's not their name. All those characters up there aren't just "aeiou with funny marks" - they're different characters, just as if they were x's and z's.
The only reason I can think of for doing what you're attempting is to store the names as 7-bit ASCII, which is a really US-centric view of data.
At any rate - assuming you're really stuck on this path, the only thing I can think of would be to have a mapping table somewhere of "Weird non-US characters that Those Durn Furriners shouldn't be using" to "The Five Vowels The Computer Gods Intended".
But be prepared for your users to complain bitterly about you changing their names...
Good luck, Grant
In Theory, there is no difference between theory and practice.<br />In Practice, there is no relationship between theory and practice.
Joined: Oct 20, 2003
The names will not be changed. In our databases, the names will be stored as "M�ller" and - for internal search and sort purposes - also stored as "MULLER". We can't change this situation because there's a legacy system that must still work with these names.
Originally posted by Paul Clapham: The "decomposition" part of this Unicode report should get you started.
Now that is very cool - will need to read in detail tonight.
Meyer - ahh, I understand the requirement now. Apologies if I sounded snippy - I've seen too many systems implemented where the designer was trying to "get rid of all these stupid marks", because they had no concept of anything other than ASCII.
You may also be interested in java.text.Collator and related classes (like java.text.CollationKey). I've never really gotten around to using them, but they're apparently designed with this sort of thing in mind. [ April 21, 2006: Message edited by: Jim Yingst ]
Originally posted by Jim Yingst: You may also be interested in java.text.Collator and related classes (like java.text.CollationKey). I've never really gotten around to using them, but they're apparently designed with this sort of thing in mind.
You can tell a Collator to ignore accents when sorting, but it sounds like the OP needs to strip the accents so he can feed the names to a legacy system. The CollationElementIterator class could be of some use, but it would still leave you a lot of hand coding to do (I know this because I've just spent several hours fighting with it myself). I think you're better off doing as Grant said and writing up your own mapping table. If you're just converting accented letters to their unaccented equivalents, a simple switch block would do it.
But what if you receive the name as "Mueller"? Are you supposed to drop the 'e'?
Joined: Jan 30, 2000
I agree. It's unfortunate that there's no getCanonicalForm() or getSimplifiedForm() on Collator or CollationKey, to return the simplest string that's considered equivalent by a given Collator. Seems like they have all the necessary tables and info buried within the class, but decline to expose it in a form that would be useful to legacy systems. Hmff. At least Collator may be useful for testing. Not that it would necessarily be more correct than a hand-customized table, but comparing the results of a Collator-based sort with other techniques could well be useful in identifying anomalies that might otherwise be difficult to detect.