After reading the existing forum entries on Character Encoding, I have finally decided to declare a constant "US-ASCII" to respond to the URLYBird requirement: "character encoding is 8 bit US ASCII". When reading a field, I use:
And when writing a field, I use:
, where data is a String representing the value of the field.
(1) Does that make any sense to you ?? --- I have never actually used character encoding...so far, I always just wrote bytes and instantiated Strings, ignoring any Charsets.....
(2) I wonder if there is a way to validate, IF a given String is valid for a given CharSet. I want that the Data checks - before updating the data file - if a given String actually contains only valid characters.....but i cannot see how this can be done. I have seen that the "".getBytes(CHARSET)-method throws an UnsupportedEncodingException, but I guess this is not what i want here...I would need something like "".isEncodable(CHARSET)....
==> What have you done??..Did you check this at all??
My 3 cents:
1/ I used another charset (iso-8859-1)
2/ I used code which is similar to Roberto's (so just using the charset name instead of a charset instance)
3/ When during conversion from bytes to string (or the other way around) an error occurs, an UnsupportedEncodingException is thrown. This exception is caught and handled appropriately. So that's your check to see if the bytes/String are valid based on the charset
well, I will then change my code to just use the Charset name, but i think that does not really matter....
But one thing: Are you actually sure that catching the UnsupportedEncodingException is a sufficient validation???....The javadoc says that this exception is thrown if the specified encoding is not supported....Hence, I believe that you would get this exception if you specify "BLA" and there is no encoder called "BLA"...
...BUT: if you specify "US-ASCII" and you try to decode a String containing characters not supported by "US_ASCII", I doubt you will get an exception....because US_ASCII is still supported...
oli mueller wrote:Are you actually sure that catching the UnsupportedEncodingException is a sufficient validation???....The javadoc says that this exception is thrown if the specified encoding is not supported
I had a closer look to the javadoc of String.getBytes and it gives the following explanation:
The behavior of this method when this string cannot be encoded in the given charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.
So the CharsetEncoder.canEncode is the method you are looking for But like already indicated I didn't execute any validation at all
Joined: Feb 13, 2011
I just tried it out: it works...either use or catch the exception of encoder.encode() method....I guess, since we got so far here, i will probably implement it...not sure if it will give me some brownies or if they deduct some points because the junior programmer hasnt heard about encoders...;)...