I'm working on the SCJD assignment and it states that the data in my database file is "8 bit US ASCII". They also imply that DataInputStream is my class of choice for parsing the data.
Anyway, I receive the data via a DataInputStream as a byte array and transform it into a String. Can I just get away with:
String s = new String(myByteArray); ???
Alternatively, I can also use
String s = new String(myByteArray, [encoding]);
but the closest available encodings according to the 1.5 JavaDocs are US-ASCII (which is 7-bit ASCII) or UTF-8. Is UTF-8 the same as "8 bit US ASCII"? Or can I just say that 8-bit ASCII will translate to unicode without serious issues and just use the first line above?
Am I being too fussy, or is this a legitimate issue?
My interpretation would be that the characters are all encoded in US-ASCII (so you can expect the high-order bit of each byte to be 0), but that each character does indeed take up 8 bits (i.e., 1 byte).
Theoretically, with a 7-bit encoding, one could store 8 characters into 7 bytes to save space. Calling it "8-bit US-ASCII" indicates that this is not the case here. So one byte equals one character.
I'm having the same problem with my URLyBird, where the 8 bit US-ASCII is to be used. I think the best way is to assume that only 7 bit US-ASCII characters can be stored in database file. So the "8bit" requirement is confusing. I think it is better NOT to use any os ISO-8859-x characters at all, even though they contain the US_ASCII charset. Why?
Because IF I choose to use e.g. the ISO-8859-2 charset, then I may type my locale-specific strings like "żółź", then it will work for me, but then on the other machine with different locale, reading a record with that saved string may lead to strange behavior, resulting in "?,?,?,..." strings.
Yeah, they mention the US-ASCII charset as being 8-bit, but it is in fact 7-bit. In my choices.txt file, I said that I used it, even though the instructions refer to it as being 8-bit. Here's a pretty small sample of the code I used to read the database:
where ENCODING is a String constant = "US-ASCII".
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com