File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes validating a byte array for some encoding Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "validating a byte array for some encoding" Watch "validating a byte array for some encoding" New topic

validating a byte array for some encoding

Surasak Leenapongpanit
Ranch Hand

Joined: May 10, 2002
Posts: 341
Hi all
I have a byte array that may be converted to a String with some specified encoding, like so:
String encodedChars = new String(bytes, encoding);
If the specified encoding is not supported, this throws an exception. If however there are invalid characters in the byte array, they are simply dropped from the String result - I wish I could get an exception.
How can I check that all characters in the byte array are valid for the specified encoding?
Lasse Koskela

Joined: Jan 23, 2002
Posts: 11962
Do you know how long (how many characters) the resulting String should be? That would be easy to check. Other than that, I have no clue.

Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Jim Yingst

Joined: Jan 30, 2000
Posts: 18671
You need the java.nio.charsets package in JDK 1.4+:

Unfortunately the CharacterCodingException doesn't seem to include correct info about the position at which the error occurred - I keep getting "Input lenght = 1" even when the error isn't at the beginning of the string. I suppose you could loop through and decode each byte individually, to learn where the errors really are. But that's inelegant considering we're using nio, which is supposed to support bulk operations. Also it would be more complex if our target encoding were a variable-length encoding like UTF-8 rather than US-ASCII, since we don't know in advance how many bytes are required to make up a single char.

"I'm not back." - Bill Harding, Twister
I agree. Here's the link:
subject: validating a byte array for some encoding
It's not a secret anymore!