File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes 8-bit US ASCII encoding??? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "8-bit US ASCII encoding???" Watch "8-bit US ASCII encoding???" New topic

8-bit US ASCII encoding???

Jimmy Ho
Ranch Hand

Joined: Jul 31, 2007
Posts: 61
I'm working on the SCJD assignment and it states that the data in my database file is "8 bit US ASCII". They also imply that DataInputStream is my class of choice for parsing the data.

Anyway, I receive the data via a DataInputStream as a byte array and transform it into a String. Can I just get away with:

String s = new String(myByteArray); ???

Alternatively, I can also use

String s = new String(myByteArray, [encoding]);

but the closest available encodings according to the 1.5 JavaDocs are US-ASCII (which is 7-bit ASCII) or UTF-8. Is UTF-8 the same as "8 bit US ASCII"? Or can I just say that 8-bit ASCII will translate to unicode without serious issues and just use the first line above?

Am I being too fussy, or is this a legitimate issue?
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
You can read about Java Supported Encoding here

The following encodings are 8-bit encodings:


Those are valid to be used.

US-ASCII is not capable to represent what UTF-8 is representing. Plus UTF-8 could be misinterpreted. A character in UTF-8 may occuppy more than one byte.

I hope this helps!
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
My interpretation would be that the characters are all encoded in US-ASCII (so you can expect the high-order bit of each byte to be 0), but that each character does indeed take up 8 bits (i.e., 1 byte).

Theoretically, with a 7-bit encoding, one could store 8 characters into 7 bytes to save space. Calling it "8-bit US-ASCII" indicates that this is not the case here. So one byte equals one character.
Pawel Solarski

Joined: Jul 31, 2009
Posts: 2
I'm having the same problem with my URLyBird, where the 8 bit US-ASCII is to be used. I think the best way is to assume that only 7 bit US-ASCII characters can be stored in database file. So the "8bit" requirement is confusing. I think it is better NOT to use any os ISO-8859-x characters at all, even though they contain the US_ASCII charset. Why?

Because IF I choose to use e.g. the ISO-8859-2 charset, then I may type my locale-specific strings like "żółź", then it will work for me, but then on the other machine with different locale, reading a record with that saved string may lead to strange behavior, resulting in "?,?,?,..." strings.

I hope that ranch sheriffs agree with me

SCJP 5.0
Roel De Nijs

Joined: Jul 19, 2004
Posts: 8408

Hi Pawel,

Welcome to the JavaRanch!

Not (yet) a ranch sheriff, but I used the ISO-8859-1 char set. I think it doesn't matter which one you uses, just document your decision in choices.txt.

Kind regards,

SCJA, SCJP (1.4 | 5.0 | 6.0), SCJD
Roberto Perillo

Joined: Dec 28, 2007
Posts: 2271

Roel De Nijs wrote:Not (yet) a ranch sheriff...

Ah... that's my good buddy Roel, the proud of Belgium!

Cheers, Bob "John Lennon" Perillo
SCJP, SCWCD, SCJD, SCBCD - Daileon: A Tool for Enabling Domain Annotations
Roberto Perillo

Joined: Dec 28, 2007
Posts: 2271

Howdy, Pawel. Welcome to JavaRanch!

Yeah, they mention the US-ASCII charset as being 8-bit, but it is in fact 7-bit. In my choices.txt file, I said that I used it, even though the instructions refer to it as being 8-bit. Here's a pretty small sample of the code I used to read the database:

where ENCODING is a String constant = "US-ASCII".
I agree. Here's the link:
subject: 8-bit US ASCII encoding???
It's not a secret anymore!