File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

US ASCII 8 bit encoding

 
Marinus Geuze
Greenhorn
Posts: 21
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

My assignment states that all text values only contain 8 bits characters. The encoding of these characters is US ASCII.

By reading this I remembered a quote in the �Complete Java 2 Certification, Fifth Edition� book. The quote was this:

�The strings that denote encoding names are determined by standards committees, so they are not especially obvious of informative. For example, the U.S. ASCII encoding name is not USASCII as you might expect, but rather ISO8859-1.�

I looked in the posts on this forum, but I can�t find a clear answer.

What I red was that U.S. ASCII was originally 7 bits, and didn�t contain European characters. So the US ASCII standard was extended to 8 bits and with support for European characters. This new version was named ISO8859-1.

But when I red the posts, people said that they are using the USASCII anyway. While to me it seems that ISO8859-1 is the way to go.

So can someone give my a clear answer which to use?

Kind regards,

Marinus
 
Sam Codean
Ranch Hand
Posts: 194
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Read a single byte and cast it to char and you will get the character
 
Marinus Geuze
Greenhorn
Posts: 21
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Sam,

I find it difficult to say if your approach is correct. But what I do know if you do not define your encoding then the default encoding is used. That is right voor developers in the US, but I from somewhere else so I can't depend on that.

So I use the new String([bytes], encoding) constructor! I am just not sure which encoding to use.

I hope that someone can give me a clear anwser.

Kind regards!
 
Sam Codean
Ranch Hand
Posts: 194
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
for the assignment the us ascii or the iso code anything is fine both will work equally
 
Sam Codean
Ranch Hand
Posts: 194
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hey marinus!!
I did not get if my approach is wrong. Why should it be??

the Reqs say that the character encoding is US-ASCII
I read bytes and never read characters
when i want to read a char i read a byte and cast it to char
I remember that a conversion of byte to int will fill up the MSByte and int to char will take care of getting the unicode character. So i will indeed get the right character code
while writing back as well i write only a byte so that will also take care of writing only the LSByte which is the US ASCII code??
Can anyone else shed some light if my approach is incorrect?

I do agree that the implementation is bad considering if the Encoding changes. But i think i took a decision and mentioned in the choices.txt. tHat should not get me deductions.
 
Marinus Geuze
Greenhorn
Posts: 21
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Sam,

I have really no idea if your approach is wrong. It seems to me that you do not really take care of the US ASCII encoding. But maybe someone else can confirm this.

I am also not satisfied. Because I have still no idea what the difference is between the US ASCII and ISO8859-1. I still hope someone can clarify this.

Greets, Marinus
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ASCII uses 7 bits, so it has valid characters between 0 and 127 only. ISO-8859-1 uses 8 bits, and consequently has 256 characters. Luckily, the first 128 characters of both encoding are identical.

Getting back to your original post, there really is no 8-bit ASCII encoding (see above). But if you know that something is encoded in ISO-8859-1, then you know that it's also encoded in ASCII, because both are identical for all ASCII characters.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic