I have text that comes through as Latin ("ISO-8859-1") and I want to simplify the encoding to ASCII ("US-ASCII"). Both are 8-bit encoding, the first supports all 256 characters (including accented letters for multiple languages) while the latter supports the first 128 characters, primarily English text.
I found one way to do it using the
String constructor, namely:
The process converts characters 128-255 to the "?" character. Is there a smarter conversion available in the
Java APIs? For example, it would be useful if the multiple "e" vowels with accents (upper and lower case) were converted to their non-accented counter-parts, rather than a "?". Is there anything like that available in Java?
Alternatively, I could write a parser that reads the characters one at a time and uses a look-up table for each letter, but it seems like re-inventing the wheel to me, so I thought I'd checked here first.