wood burning stoves 2.0*
The moose likes Java in General and the fly likes characters can not be displaied for codes between 0 and 65,535 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "characters can not be displaied for codes between 0 and 65,535" Watch "characters can not be displaied for codes between 0 and 65,535" New topic
Author

characters can not be displaied for codes between 0 and 65,535

Marius Constantin
Ranch Hand

Joined: Nov 23, 2011
Posts: 62

Dear expers

I can't understand why my JVM doesn't display the characters which have codes from 0 to 65,535. What is causing this ?

Here are print screens coming from my JVM, attached.

Thank you very much for all your help and time !

kind regards,

marius

Here is my code :




[Thumbnail for c1.PNG]


[Thumbnail for c2.PNG]

Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

1. Not every value between 0 and 65535 is a valid Unicode character.

2. Even those that are valid characters are not valid in every encoding.

3. In general, you need to specify an appropriate encoding for your display app (looks like Windows cmd.exe, in this case) to use.

4. Even if your display tool is using the same encoding as what the character was written in, if the font that's being used to display the character does not have a glyph for that character, you'll get a default, such as a box or a question mark.

5. I haven't even looked at your getRandomChar() method, so there may be problems there.

6. Finally, just randomly generating characters and then displaying them in an arbitrary tool with an unspecified encoding and unspecified font is not a recipe for success. Perhaps if you could explain what you're actually trying to accomplish?
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38007
    
  22
Try displaying the output on a JOptionPane. Go through the Character wrapper class and see whether there are any methods allowing you to see whether a particular char can be printed at all. There probably is.
Marius Constantin
Ranch Hand

Joined: Nov 23, 2011
Posts: 62

Jeff Verdegan wrote:

1. Not every value between 0 and 65535 is a valid Unicode character.

2. Even those that are valid characters are not valid in every encoding.

3. In general, you need to specify an appropriate encoding for your display app (looks like Windows cmd.exe, in this case) to use.

4. Even if your display tool is using the same encoding as what the character was written in, if the font that's being used to display the character does not have a glyph for that character, you'll get a default, such as a box or a question mark.

5. I haven't even looked at your getRandomChar() method, so there may be problems there.

6. Finally, just randomly generating characters and then displaying them in an arbitrary tool with an unspecified encoding and unspecified font is not a recipe for success. Perhaps if you could explain what you're actually trying to accomplish?


Thank you very much Jeff and Ritchie !

Jef regarding your answers, I have some more questions. hope you have some more time to spare. thank you very much for everything. I really really appreciate your help.

1. invalid UTF-8 characters codes are decimal codes 192 193 245...255 ?

"Red cells must never appear in a valid UTF-8 sequence. The first two (C0 and C1) could only be used for overlong encoding of basic ASCII characters. The remaining red cells indicate start bytes of sequences that could only encode numbers larger than the 0x10FFFF limit of Unicode. The byte 244 (hex 0xF4) could also encode some values greater than 0x10FFFF; such a sequence is also invalid. "

wikipedia : wikipedia UTF-8 Codepage layout

2. invalid UTF-8 character codes, are valid character codes in another encoding ?

3. how can I specify an encoding for my display app ? cmd.exe and Notepad++ for windows ?

6. I am just trying to display 175 characters selected randomly

Thank you very much for all your help !

kind regards,
marius
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Marius Constantin wrote:6. I am just trying to display 175 characters selected randomly


The question "Why?" still applies here.

Is there a purpose for this display? Is it important that you be able to choose Cyrillic and Coptic characters (just for example) and have them be displayed? If so then why did you choose to restrict your choices to only characters from the BMP of Unicode?
Marius Constantin
Ranch Hand

Joined: Nov 23, 2011
Posts: 62

Paul Clapham wrote:
Marius Constantin wrote:6. I am just trying to display 175 characters selected randomly


The question "Why?" still applies here.

Is there a purpose for this display? Is it important that you be able to choose Cyrillic and Coptic characters (just for example) and have them be displayed? If so then why did you choose to restrict your choices to only characters from the BMP of Unicode?


Hi Paul !

thank you so much for answering ! 175 was a randomly thought of character. I just want to display randomly 175 characters of the UTF-8 character set in notepad++ or cmd in any encoding, in any font. This is for studying purposes, I am learning how to program in Java. Just for the sake of programming, just for fun

a lot more clear now ?

please help.

kind regards,
marius
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

You're still confused, I think. There's no such thing as "the UTF-8 character set". UTF-8 is an "encoding" or a "charset", it's Unicode which is the character set. Java characters are Unicode characters, which require 16 bits to represent. (Let's ignore the "astral" planes of Unicode which go beyond 65535 for now.) But often people want to store Strings (which are arrays of characters internally) in arrays of bytes, which as you know are 8 bits. So there has to be an "encoding" process to do that, and there is a very long list of encodings which do it in various ways.

Many of those encodings, like ISO-8859-1 and its relatives, can only represent a subset of Unicode characters, and when they are given a character outside that subset they just encode it as a question mark. But others, like UTF-8, can represent any Unicode character. They do that by encoding each character as one or more bytes, as you will have seen from what you read in Wikipedia.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38007
    
  22
Does Java™ directly use UTF-8 at all? I thought it used UTF-16 whenever there are chars greater than 0xffff.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Campbell Ritchie wrote:Does Java™ directly use UTF-8 at all? I thought it used UTF-16 whenever there are chars greater than 0xffff.


The char type in the Java language is always UTF-16 (and hence so are Character and String). Classes like Readers and Writers that deal with converting back and forth between Java chars and bytes in particular encoding can be told which encoding to use.
Marius Constantin
Ranch Hand

Joined: Nov 23, 2011
Posts: 62

Campbell Ritchie wrote:Does Java™ directly use UTF-8 at all? I thought it used UTF-16 whenever there are chars greater than 0xffff.


could you give me an example of such character ?

thank you !
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38007
    
  22
As JV has said, Java™ uses UTF-16 throughout. I suggest you start with the Unicode FAQ.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: characters can not be displaied for codes between 0 and 65,535
 
Similar Threads
Swapping of two Strings in the form of character arrays.....
How do I use a static method from another class?
Ordering of AlphaNumeric No. in ascending Order
Programming Diversion 1
hashCode