• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

8-bit US ASCII encoding???

 
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm working on the SCJD assignment and it states that the data in my database file is "8 bit US ASCII". They also imply that DataInputStream is my class of choice for parsing the data.

Anyway, I receive the data via a DataInputStream as a byte array and transform it into a String. Can I just get away with:

String s = new String(myByteArray); ???

Alternatively, I can also use

String s = new String(myByteArray, [encoding]);

but the closest available encodings according to the 1.5 JavaDocs are US-ASCII (which is 7-bit ASCII) or UTF-8. Is UTF-8 the same as "8 bit US ASCII"? Or can I just say that 8-bit ASCII will translate to unicode without serious issues and just use the first line above?

Am I being too fussy, or is this a legitimate issue?
 
Ranch Hand
Posts: 961
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can read about Java Supported Encoding here

The following encodings are 8-bit encodings:

ISO-8859-1
ISO-8859-2
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
windows-1252

Those are valid to be used.

US-ASCII is not capable to represent what UTF-8 is representing. Plus UTF-8 could be misinterpreted. A character in UTF-8 may occuppy more than one byte.

I hope this helps!
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My interpretation would be that the characters are all encoded in US-ASCII (so you can expect the high-order bit of each byte to be 0), but that each character does indeed take up 8 bits (i.e., 1 byte).

Theoretically, with a 7-bit encoding, one could store 8 characters into 7 bytes to save space. Calling it "8-bit US-ASCII" indicates that this is not the case here. So one byte equals one character.
 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I'm having the same problem with my URLyBird, where the 8 bit US-ASCII is to be used. I think the best way is to assume that only 7 bit US-ASCII characters can be stored in database file. So the "8bit" requirement is confusing. I think it is better NOT to use any os ISO-8859-x characters at all, even though they contain the US_ASCII charset. Why?

Because IF I choose to use e.g. the ISO-8859-2 charset, then I may type my locale-specific strings like "żółź", then it will work for me, but then on the other machine with different locale, reading a record with that saved string may lead to strange behavior, resulting in "?,?,?,..." strings.

I hope that ranch sheriffs agree with me
 
Sheriff
Posts: 11604
178
Hibernate jQuery Eclipse IDE Spring MySQL Database AngularJS Tomcat Server Chrome Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Pawel,

Welcome to the JavaRanch!

Not (yet) a ranch sheriff, but I used the ISO-8859-1 char set. I think it doesn't matter which one you uses, just document your decision in choices.txt.

Kind regards,
Roel
 
Bartender
Posts: 2292
3
Eclipse IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Roel De Nijs wrote:Not (yet) a ranch sheriff...



Ah... that's my good buddy Roel, the proud of Belgium!
 
Roberto Perillo
Bartender
Posts: 2292
3
Eclipse IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Howdy, Pawel. Welcome to JavaRanch!

Yeah, they mention the US-ASCII charset as being 8-bit, but it is in fact 7-bit. In my choices.txt file, I said that I used it, even though the instructions refer to it as being 8-bit. Here's a pretty small sample of the code I used to read the database:



where ENCODING is a String constant = "US-ASCII".
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic