aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes How to check against CharSet Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "How to check against CharSet" Watch "How to check against CharSet" New topic
Author

How to check against CharSet

oli mueller
Ranch Hand

Joined: Feb 13, 2011
Posts: 42
Hi,

After reading the existing forum entries on Character Encoding, I have finally decided to declare a constant "US-ASCII" to respond to the URLYBird requirement: "character encoding is 8 bit US ASCII". When reading a field, I use:


And when writing a field, I use:
, where data is a String representing the value of the field.

(1) Does that make any sense to you ?? --- I have never actually used character encoding...so far, I always just wrote bytes and instantiated Strings, ignoring any Charsets.....

(2) I wonder if there is a way to validate, IF a given String is valid for a given CharSet. I want that the Data checks - before updating the data file - if a given String actually contains only valid characters.....but i cannot see how this can be done. I have seen that the "".getBytes(CHARSET)-method throws an UnsupportedEncodingException, but I guess this is not what i want here...I would need something like "".isEncodable(CHARSET)....

==> What have you done??..Did you check this at all??

THANKS !!

Roberto Perillo
Bartender

Joined: Dec 28, 2007
Posts: 2266
    
    3

Howdy, Oli!

Well, I did pretty much the same thing as you. The only little difference is that I did the following:





If "data" is the value of a particular record fields, yes, that's correct.

I wonder if there is a way to validate, IF a given String is valid for a given CharSet.


Well champ, I didn't go that further. I simply wrote the values to the file blindly.


Cheers, Bob "John Lennon" Perillo
SCJP, SCWCD, SCJD, SCBCD - Daileon: A Tool for Enabling Domain Annotations
Roel De Nijs
Bartender

Joined: Jul 19, 2004
Posts: 5406
    
  13

My 3 cents:
1/ I used another charset (iso-8859-1)
2/ I used code which is similar to Roberto's (so just using the charset name instead of a charset instance)
3/ When during conversion from bytes to string (or the other way around) an error occurs, an UnsupportedEncodingException is thrown. This exception is caught and handled appropriately. So that's your check to see if the bytes/String are valid based on the charset

SCJA, SCJP (1.4 | 5.0 | 6.0), SCJD
http://www.javaroe.be/
oli mueller
Ranch Hand

Joined: Feb 13, 2011
Posts: 42
Thanks guys,

well, I will then change my code to just use the Charset name, but i think that does not really matter....

But one thing: Are you actually sure that catching the UnsupportedEncodingException is a sufficient validation???....The javadoc says that this exception is thrown if the specified encoding is not supported....Hence, I believe that you would get this exception if you specify "BLA" and there is no encoder called "BLA"...

...BUT: if you specify "US-ASCII" and you try to decode a String containing characters not supported by "US_ASCII", I doubt you will get an exception....because US_ASCII is still supported...
Roel De Nijs
Bartender

Joined: Jul 19, 2004
Posts: 5406
    
  13

oli mueller wrote:Are you actually sure that catching the UnsupportedEncodingException is a sufficient validation???....The javadoc says that this exception is thrown if the specified encoding is not supported

Good catch

I had a closer look to the javadoc of String.getBytes and it gives the following explanation:
The behavior of this method when this string cannot be encoded in the given charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.


So the CharsetEncoder.canEncode is the method you are looking for But like already indicated I didn't execute any validation at all
oli mueller
Ranch Hand

Joined: Feb 13, 2011
Posts: 42
thanks Roel,

I just tried it out: it works...either use or catch the exception of encoder.encode() method....I guess, since we got so far here, i will probably implement it...not sure if it will give me some brownies or if they deduct some points because the junior programmer hasnt heard about encoders...;)...

Thanks anyway...
 
Consider Paul's rocket mass heater.
 
subject: How to check against CharSet