aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes 8-bit US ASCII encoding??? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "8-bit US ASCII encoding???" Watch "8-bit US ASCII encoding???" New topic
Author

8-bit US ASCII encoding???

Jimmy Ho
Ranch Hand

Joined: Jul 31, 2007
Posts: 61
I'm working on the SCJD assignment and it states that the data in my database file is "8 bit US ASCII". They also imply that DataInputStream is my class of choice for parsing the data.

Anyway, I receive the data via a DataInputStream as a byte array and transform it into a String. Can I just get away with:

String s = new String(myByteArray); ???

Alternatively, I can also use

String s = new String(myByteArray, [encoding]);

but the closest available encodings according to the 1.5 JavaDocs are US-ASCII (which is 7-bit ASCII) or UTF-8. Is UTF-8 the same as "8 bit US ASCII"? Or can I just say that 8-bit ASCII will translate to unicode without serious issues and just use the first line above?

Am I being too fussy, or is this a legitimate issue?
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
You can read about Java Supported Encoding here

The following encodings are 8-bit encodings:

ISO-8859-1
ISO-8859-2
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
windows-1252

Those are valid to be used.

US-ASCII is not capable to represent what UTF-8 is representing. Plus UTF-8 could be misinterpreted. A character in UTF-8 may occuppy more than one byte.

I hope this helps!
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42276
    
  64
My interpretation would be that the characters are all encoded in US-ASCII (so you can expect the high-order bit of each byte to be 0), but that each character does indeed take up 8 bits (i.e., 1 byte).

Theoretically, with a 7-bit encoding, one could store 8 characters into 7 bytes to save space. Calling it "8-bit US-ASCII" indicates that this is not the case here. So one byte equals one character.


Ping & DNS - my free Android networking tools app
Pawel Solarski
Greenhorn

Joined: Jul 31, 2009
Posts: 2
Hi,
I'm having the same problem with my URLyBird, where the 8 bit US-ASCII is to be used. I think the best way is to assume that only 7 bit US-ASCII characters can be stored in database file. So the "8bit" requirement is confusing. I think it is better NOT to use any os ISO-8859-x characters at all, even though they contain the US_ASCII charset. Why?

Because IF I choose to use e.g. the ISO-8859-2 charset, then I may type my locale-specific strings like "żółź", then it will work for me, but then on the other machine with different locale, reading a record with that saved string may lead to strange behavior, resulting in "?,?,?,..." strings.

I hope that ranch sheriffs agree with me

SCJP 5.0
Roel De Nijs
Bartender

Joined: Jul 19, 2004
Posts: 5402
    
  13

Hi Pawel,

Welcome to the JavaRanch!

Not (yet) a ranch sheriff, but I used the ISO-8859-1 char set. I think it doesn't matter which one you uses, just document your decision in choices.txt.

Kind regards,
Roel


SCJA, SCJP (1.4 | 5.0 | 6.0), SCJD
http://www.javaroe.be/
Roberto Perillo
Bartender

Joined: Dec 28, 2007
Posts: 2266
    
    3

Roel De Nijs wrote:Not (yet) a ranch sheriff...


Ah... that's my good buddy Roel, the proud of Belgium!


Cheers, Bob "John Lennon" Perillo
SCJP, SCWCD, SCJD, SCBCD - Daileon: A Tool for Enabling Domain Annotations
Roberto Perillo
Bartender

Joined: Dec 28, 2007
Posts: 2266
    
    3

Howdy, Pawel. Welcome to JavaRanch!

Yeah, they mention the US-ASCII charset as being 8-bit, but it is in fact 7-bit. In my choices.txt file, I said that I used it, even though the instructions refer to it as being 8-bit. Here's a pretty small sample of the code I used to read the database:



where ENCODING is a String constant = "US-ASCII".
 
wood burning stoves
 
subject: 8-bit US ASCII encoding???