aspose file tools*
The moose likes Beginning Java and the fly likes How a character save in 2 bytes in Java? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "How a character save in 2 bytes in Java?" Watch "How a character save in 2 bytes in Java?" New topic
Author

How a character save in 2 bytes in Java?

abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635
How a character save in 2 bytes in JAVA?

For example, "a".

The ascii code for a is 97.
http://www.asciitable.com/

How this number, 97 saves in two bytes?

Mohamed Sanaulla
Saloon Keeper

Joined: Sep 08, 2007
Posts: 3068
    
  33

The idea of making the character 2 bytes is to support larger number of characters (2^15?) And the encoding used in Java is not ASCII, but its Unicode (http://en.wikipedia.org/wiki/Unicode)


Mohamed Sanaulla | My Blog
abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635
Thanks Brother, But I want to know when a saves in two bytes, I want to know value of every bit of these two bytes.
Mohamed Sanaulla
Saloon Keeper

Joined: Sep 08, 2007
Posts: 3068
    
  33

abalfazl hossein wrote:Thanks Brother, But I want to know when a saves in two bytes, I want to know value of every bit of these two bytes.


You could relate this to- How is 1 stored in an int - Big/Little Endian (and then remaining all Zeros)
abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635


When I run this program the result is:

84 104 105 115 32 105 116 32 121 111 117 114 32 115 111 110 103

T>84
h>104

The sentence in file is :

This is your song

When I check ascii table, These numbers match with that table:

http://www.asciitable.com/
Mohamed Sanaulla
Saloon Keeper

Joined: Sep 08, 2007
Posts: 3068
    
  33

Yes- Unicode is an extension of ASCII- from 0-127- Similar to ASCII and then from 128 onwards it adds more characters.
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3611
    
  14

2 bytes will be used when you use for exame, UTF-16 encoding.

You need to set the encoding of the OutputStreamWriter or PrintStream you are using.
abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635
If I use cast, I can see character in output:

System.out.print((char)c+ " ");

Is there a way in JAVA that I can see character in output without using cast?
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3611
    
  14

No. If the character is stored in an int variable, you will always have to cast it to a char.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38334
    
  23
Mohamed Sanaulla wrote: . . . larger number of characters (2^15?) . . .
2 to the 16th.
Mohamed Sanaulla
Saloon Keeper

Joined: Sep 08, 2007
Posts: 3068
    
  33

Campbell Ritchie wrote:
Mohamed Sanaulla wrote: . . . larger number of characters (2^15?) . . .
2 to the 16th.


Thanks for correcting. I had a doubt in that
abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635
My question is not only about my example. I mean that is there any method that read files as character, Then I don't need to cast.
Mohamed Sanaulla
Saloon Keeper

Joined: Sep 08, 2007
Posts: 3068
    
  33

abalfazl hossein wrote:My question is not only about my example. I mean that is there any method that read files as character, Then I don't need to cast.


Did you look at the classes mentioned by me? (was it in other post?)

Update: This has been answered in your query- http://www.coderanch.com/t/520869/java/java/files
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14102
    
  16

What the number is that is used to represent a character, and how many bytes are needed, is determined by the character encoding. ASCII is one encoding, which takes one byte (actually, not even a complete byte, but just 7 bits) per character. Obviously because only 7 bits are used the character set that you can represent with ASCII is limited. Java internally stores characters in the char data type with two bytes (16 bits) per character, using the UTF-16 encoding.

Other encodings that are commonly used are ISO-8859-1, which uses one byte (8 bits) per character, and UTF-8 which uses a variable number of bytes per character (from 1 to 4 bytes).

In Java, InputStream and OutputStream are used for reading and writing binary data. For reading and writing text, you use a Reader or a Writer. Those classes wrap an InputStream or OutputStream and apply a character encoding to the binary data, to interpret it as text.

If you want to write text to a file and specify the character encoding yourself, then you can do that for example like this:

Remember:

  • InputStream and OutputStream are for reading and writing binary data
  • Reader and Writer are for reading and writing text (characters)


  • Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
    Scala Notes - My blog about Scala
    fred rosenberger
    lowercase baba
    Bartender

    Joined: Oct 02, 2003
    Posts: 11229
        
      16

    Jesper de Jong wrote:Obviously because only 7 bits are used the character set that you can represent with ASCII is limited.

    Isn't that true no matter how many bits you allow?



    There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635


    This is the text in the file, I use netbeans IDE

    abalfazl hossein

    ابالفضل حسین


    but the output is:

    File text : ?abalfazl hossein
    Reading Process Completly Successfully.


    How can I fix it?
    Jesper de Jong
    Java Cowboy
    Saloon Keeper

    Joined: Aug 16, 2005
    Posts: 14102
        
      16

    Are you printing the output to a Windows command prompt window, with System.out.println()? The command prompt window normally cannot handle things like arabic text, because it uses a font that does not contain arabic characters.
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    Then which method must be used?
    fred rosenberger
    lowercase baba
    Bartender

    Joined: Oct 02, 2003
    Posts: 11229
        
      16

    abalfazl hossein wrote:Then which method must be used?

    to do what? if you are trying to print Arabic characters to the command line, it will never work, regardless of the methods called. This has nothing to do with java, but with the cmd.exe.

    If you are trying to do something else, then tell us what that is.
    Stephan van Hulst
    Bartender

    Joined: Sep 20, 2010
    Posts: 3611
        
      14

    Not just that. He is reading in UTF-8 alright, but he's still writing in the system default encoding.
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    I saved these sentences in a text file:

    abalfazl hossein

    ابالفضل حسین


    Now I want to read these by this code

    public class Main {

    /**
    * @param args the command line arguments
    */
    public static void main(String[] args) throws IOException{




    try {
    BufferedReader i = new BufferedReader(new InputStreamReader
    (new FileInputStream("myfile1.txt"),"UTF-8"));
    String str1 = i.readLine();

    System.out.println("File text : "+ str1);
    System.out.println("Reading Process Completly Successfully.");
    }


    catch(UnsupportedEncodingException ue){

    System.out.println("Not supported : ");

    }

    catch(IOException e){

    System.out.println(e.getMessage());

    }
    }

    }


    But the output is:


    File text : ?abalfazl hossein
    Reading Process Completly Successfully.


    Now what must I do?

    Thanks in advance!
    Stephan van Hulst
    Bartender

    Joined: Sep 20, 2010
    Posts: 3611
        
      14

    You can't do anything. You are reading the characters correctly. However, you are printing them to standard output. The standard output doesn't use a font that supports arabic characters. If you want to display the characters correctly, you have to either print them to a file and read that file in a text editor using a font like MS Gothic, or append them to text area or something in a Java GUI, again, using a font like MS Gothic.
    Jesper de Jong
    Java Cowboy
    Saloon Keeper

    Joined: Aug 16, 2005
    Posts: 14102
        
      16

    abalfazl hossein wrote:Now what must I do?

    Display the text in something else than the Windows command prompt. The Windows command prompt cannot display arabic characters.

    For example, write a Swing GUI for your application and display the text in a Swing label. Make sure the label uses a font that contains arabic characters.
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    then other than English, Which language is supported in command?
    Stephan van Hulst
    Bartender

    Joined: Sep 20, 2010
    Posts: 3611
        
      14

    I think it depends on which version of Windows you have, but I'm not quite sure. I always imagine a Chinese copy would have a font that supports Chinese characters, but I may be mistaken.

    In most versions though, the font is based on the Code Page 437 encoding, so you would be limited to printing those characters:

    http://en.wikipedia.org/wiki/Code_page_437
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    can this code help:

    System.out.println(System.getProperty("file.encoding"));
    System.out.println(Charset.defaultCharset().name());


    What must I import in order to use this:

    getDefaultCharSet()
    Stephan van Hulst
    Bartender

    Joined: Sep 20, 2010
    Posts: 3611
        
      14

    Encoding has nothing to do with your problem. You are already using the correct encoding. Now you need a typeface that can display the characters specified by that encoding.

    Are you familiar with making Swing or AWT GUIs? If so, make one and print your String to a textual component.

    http://www.coderanch.com/t/511292/java/java/Byte-vs-Character-streams#2313347
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    Java uses 2 bytes for storing char.

    TF-32 (or UCS-4) is a protocol for encoding Unicode characters that uses exactly 32 bits for each character.

    It means 4 bytes per each character.

    How Java store these characters?

    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    Does JAVA support UTF-32?
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    When I add this line to that it does not complile:

    System.out.println("Default Charset in Use=" + getDefaultCharSet());


    The error:

    compiling 1 source file to C:\Documents and Settings\Administrator\filereadchar_2\build\classes
    C:\Documents and Settings\Administrator\filereadchar_2\src\filereadchar\Main.java:42: cannot find symbol
    symbol : method getDefaultCharSet()
    location: class filereadchar.Main
    System.out.println("Default Charset in Use=" + getDefaultCharSet());
    1 error


    How can I fix it?
    Stephan van Hulst
    Bartender

    Joined: Sep 20, 2010
    Posts: 3611
        
      14

    I'm not sure, it might. But if it does, it will convert the characters to its internal 16 bit representation, and discard any characters that can't be represented.
    Mohamed Sanaulla
    Saloon Keeper

    Joined: Sep 08, 2007
    Posts: 3068
        
      33


    System.out.println("Default Charset in Use=" + getDefaultCharSet());

    There's- Charset.defaultCharset()
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635





    Compiling 1 source file to C:\Documents and Settings\Administrator\filereadchar_2\build\classes
    C:\Documents and Settings\Administrator\filereadchar_2\src\filereadchar\Main.java:34: cannot find symbol
    symbol : method getDefaultCharSet()
    location: class filereadchar.Main
    System.out.println("Default Charset in Use=" + getDefaultCharSet());
    1 error


    It doesn't work.
    Mohamed Sanaulla
    Saloon Keeper

    Joined: Sep 08, 2007
    Posts: 3068
        
      33

    Just using getDefaultCharSet() means that the compiler expects that method to be present in your Main class. But you havent provided that method in your Main class.

    Did you try using- Charset.defaultCharSet()? There's a link shared by Stephan


    did you try the example given in above link? And for a particular language to be displayed you need to have the supporting font for that.
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    Which font for arabic?
    Mohamed Sanaulla
    Saloon Keeper

    Joined: Sep 08, 2007
    Posts: 3068
        
      33

    abalfazl hossein wrote:Which font for arabic?

    I dont know exactly which font is used. You could try on Google.
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    Ok, Thanks!
    Stephan van Hulst
    Bartender

    Joined: Sep 20, 2010
    Posts: 3611
        
      14

    Why don't you try running my example, and edit it to get what you want?
    abalfazl hossein
    Ranch Hand

    Joined: Sep 06, 2007
    Posts: 635
    I'm thinking!

    Look at this :

    http://elearn.main.nvsu.edu.ph/ebooks/java.fundamental.classes.refrence/figs/jfc_1101.gif





    BufferedReader and InputStreamReader are in Reader group.

    But FileInputStream is in InputStream group.

    Is it possible that write this program so all classes are used be in one group?

    Stephan van Hulst
    Bartender

    Joined: Sep 20, 2010
    Posts: 3611
        
      14

    You can use FileReader instead of a InputStreamReader wrapped around a FileInputStream.

    However, this will have the disadvantage that you can't set the encoding.

    Why do you want to do this?
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: How a character save in 2 bytes in Java?