Is there a class that will read 1 character at a time and print out what the ASCII value is for a particular white space or invisible data type? ie \n, \r, \t, \f ect?
Thanks much
James Swan
Ranch Hand
Joined: Jun 26, 2001
Posts: 403
posted
0
Yeah it's a specialised class called FunWithChars
Corey McGlone
Ranch Hand
Joined: Dec 20, 2001
Posts: 3271
posted
0
My guess is probably not. Those characters are OS specific (\t is tab in Windows, but I'm not sure what represents tab in UNIX). As Java tries to be "OS independent," I doubt you'll find anything that readily converts a char into something of that form. I'm guessing that, if you want to accomplish this, you're going to have to "roll your own" in one way or another.
Perhaps someone else has an idea, but that would be my guess.
Well to be more specific, i'm trying to parse a WORD doc and there is a unicode character called the currency sign '\u00A4', which is being used as some kind of paragraph break, in addition to the standard '\n' '\r' ect!
So i'm trying to figure out how to specify the logic to identify this UNICODE character!
Right now i'm using the following statement: if (c=='\n' || c=='\t' || c=='\u00A4') but it doesn't seem to recognize that UNICODE specification!
Thanks!
Tony Morris
Ranch Hand
Joined: Sep 24, 2003
Posts: 1608
posted
0
My guess is probably not. Those characters are OS specific (\t is tab in Windows, but I'm not sure what represents tab in UNIX).
Rubbish. Have a look at an ASCII table and the name of the character 0x09 (9).
[ July 27, 2004: Message edited by: Tony Morris ] [ July 29, 2004: Message edited by: Tony Morris ]
Well - ascii is a standard over many platforms, and is the same for unix and windows for 0-127 (7bit). And of course \t was a unix-tab when dos wasn't invented.
But \u00A4 which is 164(dec) is outside off the standard, and not a whitespace - though perhaps invisible in ordinary editors.
164 is at least a 8bit-character in the extended ascii charset.
Java-characters are 16 bit, and \u00A4 is a 16-bit notation too.
Perhaps you may use a hex-editor, to find out the position, where the � is printed, and try to find out, what java is reading. Perhaps you have to tell the InputStream, which encoding to use? Or ask, which encoding it is actually using?
But I don't know, which encoding word-docs use. There is an apache - openSource - api available, to read Excel and Word docs - POI and H?? (poor obfuscating interface/ horrible ... ...).