aspose file tools
The moose likes Java in General and the fly likes How to identify white space characters Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "How to identify white space characters" Watch "How to identify white space characters" New topic
Author

How to identify white space characters

bob connolly
Ranch Hand

Joined: Mar 10, 2004
Posts: 204
Hi there!

Is there a class that will read 1 character at a time and print out what the ASCII value is for a particular white space or invisible data type? ie \n, \r, \t, \f ect?

Thanks much
James Swan
Ranch Hand

Joined: Jun 26, 2001
Posts: 403
Yeah it's a specialised class called FunWithChars



Corey McGlone
Ranch Hand

Joined: Dec 20, 2001
Posts: 3271
My guess is probably not. Those characters are OS specific (\t is tab in Windows, but I'm not sure what represents tab in UNIX). As Java tries to be "OS independent," I doubt you'll find anything that readily converts a char into something of that form. I'm guessing that, if you want to accomplish this, you're going to have to "roll your own" in one way or another.

Perhaps someone else has an idea, but that would be my guess.


SCJP Tipline, etc.
bob connolly
Ranch Hand

Joined: Mar 10, 2004
Posts: 204
Thanks for the ideas!

Well to be more specific, i'm trying to parse a WORD doc and there is a unicode character called the currency sign '\u00A4', which is being used as some kind of paragraph break, in addition to the standard '\n' '\r' ect!

So i'm trying to figure out how to specify the logic to identify this UNICODE character!

Right now i'm using the following statement: if (c=='\n' || c=='\t' || c=='\u00A4') but it doesn't seem to recognize that UNICODE specification!

Thanks!
Tony Morris
Ranch Hand

Joined: Sep 24, 2003
Posts: 1608



My guess is probably not. Those characters are OS specific (\t is tab in Windows, but I'm not sure what represents tab in UNIX).


Rubbish.
Have a look at an ASCII table and the name of the character 0x09 (9).

[ July 27, 2004: Message edited by: Tony Morris ]
[ July 29, 2004: Message edited by: Tony Morris ]

Tony Morris
Java Q&A (FAQ, Trivia)
Stefan Wagner
Ranch Hand

Joined: Jun 02, 2003
Posts: 1923

Well - ascii is a standard over many platforms, and is the same for unix and windows for 0-127 (7bit).
And of course \t was a unix-tab when dos wasn't invented.

But \u00A4 which is 164(dec) is outside off the standard, and not a whitespace - though perhaps invisible in ordinary editors.

164 is at least a 8bit-character in the extended ascii charset.

Java-characters are 16 bit, and \u00A4 is a 16-bit notation too.

Perhaps you may use a hex-editor, to find out the position, where the � is printed, and try to find out, what java is reading.
Perhaps you have to tell the InputStream, which encoding to use?
Or ask, which encoding it is actually using?

But I don't know, which encoding word-docs use.
There is an apache - openSource - api available, to read Excel and Word docs - POI and H?? (poor obfuscating interface/ horrible ... ...).


http://home.arcor.de/hirnstrom/bewerbung
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: How to identify white space characters
 
Similar Threads
Text wrapping on white-space
How to specify nowrap in css for all columns of a table
confusion in parseDouble and parseInt
How to create a component and a class with same name in StarUML?
Disable word-break CSS