File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes character coding problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "character coding problem" Watch "character coding problem" New topic
Author

character coding problem

Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
Hi!

I have

System.out.println((char)191);

but according to http://www.cdrummond.qc.ca/cegep/informat/Professeurs/Alain/files/ascii.htm

i should get an inverted L but I am getting inverted question mark(Dec 169).

Why?

Any help appreciated. Thanks.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

The page you quoted is an "extended ASCII" chart. It is not standard ASCII - because that only defines character codes from 0 to 127. Your command prompt windows uses a different character encoding than what you see in that extended ASCII chart, so you don't see the same characters as in that chart.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
Hi!

Thanks for your reply.

I need to send the inverted L to a serial port. It needs that character. It required Hex BF

What can I do?

Thanks.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21


You can just send a byte with the value 0xBF, as binary data. That doesn't sound like a problem that has to do with character encodings, unless you're trying to send binary data through an I/O class that's meant for writing text...

Note that Java has two kinds of I/O classes. There are streams (classes implementing interfaces InputStream and OutputStream) for reading and writing binary data. These just receive or send exactly the bytes that are being transferred. The second kind is readers and writers (classes implementing interfaces Reader and Writer). Those are for reading and writing text. Essentially, they are a layer on top of streams that convert "raw" bytes from and to text characters, using a character encoding.

If you need to write "raw" byte values, you should be using an OutputStream, not a Writer.
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
My apologies. I used outputstream. But still cannot. Can you look?

Many thanks.

Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19718
    
  20

Who says the character doesn't arrive correctly? As said before, the command prompt is quite limited in what it can display. This is not only true for the Windows command prompt, but also most Linux / UNIX shells.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

You're wrapping the output stream that you get from the telnet object in a PrintStream (line 7).

Then, in line 26, you use the PrintStream's print method to send it a command. Note that that method is meant for writing text, not binary data. The documentation of class PrintStream says that it will use the platform's default character encoding to encode text to bytes.

Is cmd2Device a string?

If you want to write a byte with the value 0xBF to the output stream, then don't use text-based methods like the print method from class PrintStream. Note that class OutputStream has methods to send bytes. Sending a byte with the value 0xBF is very easy:

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39393
    
  28
I tried it with
java CharacterPrinter 43 61 6d 70 62 65 6c 6c 20 bf
. . . and you can see what a wonderful output you get from that. You will notice, however, that bf is ¿ not inverted L. I couldn't find inverted L in Unicode; they must call it something different.
If you use a "bash" shell like mine, you get ¿ appearing correctly both on the terminal and the option pane. You may get different output in Windows® because the command line has a much more restricted character range than "bash".
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
Jesper de Jong wrote:
Is cmd2Device a string?

If you want to write a byte with the value 0xBF to the output stream, then don't use text-based methods like the print method from class PrintStream. Note that class OutputStream has methods to send bytes. Sending a byte with the value 0xBF is very easy:



Hi Jesper, yes, cmd2Device is a string. I cannot use out.write because out.write requires a byte. I have a string. What should I do?
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

Campbell, in your example program, you directly interpret the byte values as characters. How these characters will look when you display them in a message dialog (or in a Bash shell or Windows command prompt) depends on what character encoding the message dialog is using to display those characters.

When you see an upside down question mark, then that is because in the character set that is used, the byte value 0xBF maps to an upside down question mark. If another character set would be used, it might display a different character.

You don't get the inverted L, because the message dialog is not using a character encoding in which 0xBF maps to the inverted L character.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

Alan Blass wrote:Hi Jesper, yes, cmd2Device is a string. I cannot use out.write because out.write requires a byte. I have a string. What should I do?

What character do you have in cmd2Device that should lead to a byte 0xBF being sent?
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
Jesper de Jong wrote:
Alan Blass wrote:Hi Jesper, yes, cmd2Device is a string. I cannot use out.write because out.write requires a byte. I have a string. What should I do?

What character do you have in cmd2Device that should lead to a byte 0xBF being sent?


Hi! Byte is part of the string that I am trying to send out.

Complete string is:

cmd2Device = "" + (char)0xAA + (char)0x89 + (char)0xFF + (char)0x02 + (char)0x34 + (char)0x01 + (char)0xBF;
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

When you want to send bytes, then why are you converting those bytes to chars first? That leads to all the unnecessary hassle with character encodings. Why not send the bytes directly?

Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
Jesper de Jong wrote:When you want to send bytes, then why are you converting those bytes to chars first? That leads to all the unnecessary hassle with character encodings. Why not send the bytes directly?



Hi! But I am using:

Reflection.invoke(<classname>, <methodname>);

which is returning a String. Anyway I can get around it?
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

So you are not actually declaring it like this, as you said before?

In reality, you're getting it via some method called via reflection, and that method returns a string?

Storing arbitrary binary data in a string is a really strange design... why does your software do this?

If you really have a string with characters stored like the line above, then you could convert that to a byte array yourself and then send that byte array. It's not a pretty solution, though. The software shouldn't be storing bytes in a string like that. It would be better to fix it properly, i.e. store the data to send in a byte array directly, instead of somehow shoehorning it into a string.

Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
My apologies. I made a mistake. The byte string is part of an ASCII String that I am going to send out. It looks something like this:

cmdString = "" + ESC + devicePortNumber + "RS" + CR + cmd2Device;

and ESC is ASCII ESC and CR is carriage return and device PortNumber is 01.

How should I go about doing that?
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
I have tried:



but it doesn't work. Please help. Thanks
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39393
    
  28
No, for "bf" I get &#x00bf; = ¿, both on the terminal and on the option pane. I am presuming that is correct for my encoding, since it is the same as it says on the Unicode "0080" pdf document. I haven't had a chance to try that on a Windows® box to see whether the ¿ comes out right there. I was presuming the inverted L was a different value, but I take your point that it might be a different encoding.
I can get ┓ from &#x2513;
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39393
    
  28
I have tried the same characters on a Windows®7 command line; bf came out similar to an inverted L and rather similar to 2513 in my earlier post, but 2513 and 2517 both came out as ?.
So for Windows®, it would appear the inverted L is normal for bf, which is different from Unicode.

At this point, I think I shall give up. But I seem to be getting similar results to everybody else.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39393
    
  28
And rather than scaring other people who are "beginning" with such a difficult question, I shall move this discussion to "JiG".
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

Campbell, I think Alan just wants to send a byte with the value 0xBF. I don't think he wants to send the inverted L character in some other encoding.

Alan: You don't have to put the whole command in a string at once and then send that thing. You can just send the bytes of the different parts separately. For example:

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39393
    
  28
That's why I said I would give up.
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
Jesper de Jong wrote:

Alan: You don't have to put the whole command in a string at once and then send that thing. You can just send the bytes of the different parts separately. For example:



Hi Jesper,

Why is cmd2Device.getBytes() is used? If my cmd2Device has a character that is greater than 0x7F, it will be cut off. For example, I have 0xBF, I cannot use .getBytes() because I will get a different value.

What other methods I can use?
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

Alan Blass wrote:Why is cmd2Device.getBytes() is used? If my cmd2Device has a character that is greater than 0x7F, it will be cut off. For example, I have 0xBF, I cannot use .getBytes() because I will get a different value.

The getBytes() method of class String converts the characters in the string to raw bytes, using the character encoding that you specify ("US-ASCII" in my example). Ofcourse characters with a value > 0x7F will be cut off, because ASCII only defines characters in the range 0x00 - 0x7F.

I've been trying to explain to you multiple times that it is not a good idea to store byte values in characters, and showed you some ways to not do that. But for some reason you seem to insist on storing binary values such as 0xBF in characters in a string... why?

I even gave you an ugly solution to cast the chars directly to bytes. But you said "it doesn't work". What do you mean with "it doesn't work"?
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
Sorry.

I have changed to a array of byte.



But the IDE says possible loss of precision. And the device is not responding.

Please help thanks.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

Where exactly does it say "possible loss of precision" (for which Java statement)?

About the device not responding: Then it's time to debug, find out if you are sending exactly the right data to the device, as the device expects it.
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
Hi.

My apologies. It does not say loss of precision.

But now I and sending a series of the same command, only differing in the ID. Only ID 10(0x0A) is not responding.

Could there be a problem with 0x0A which corresponds to line feed?

Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

There's nothing special about the value 0x0A that makes Java do anything else than with any other value.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19718
    
  20

Other than the fact that it's '\n' in int form.
Alan Blass
Ranch Hand

Joined: Mar 21, 2010
Posts: 119
I found the problem using wireshark.

It is the Apache Commons Telnet client. I changes the 0x0A into 0x0A 0x0D. That's why it breaks my code.

http://apache-commons.680414.n4.nabble.com/jira-Created-NET-387-TelnetClient-use-of-FromNetASCIIInputStream-and-ToNetASCIIOutputStream-breaks-bs-td3397058.html

Thanks Jesper for all your posts.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: character coding problem