• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to identify white space characters

 
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi there!

Is there a class that will read 1 character at a time and print out what the ASCII value is for a particular white space or invisible data type? ie \n, \r, \t, \f ect?

Thanks much
 
Ranch Hand
Posts: 403
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yeah it's a specialised class called FunWithChars



 
Ranch Hand
Posts: 3271
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My guess is probably not. Those characters are OS specific (\t is tab in Windows, but I'm not sure what represents tab in UNIX). As Java tries to be "OS independent," I doubt you'll find anything that readily converts a char into something of that form. I'm guessing that, if you want to accomplish this, you're going to have to "roll your own" in one way or another.

Perhaps someone else has an idea, but that would be my guess.
 
bob connolly
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the ideas!

Well to be more specific, i'm trying to parse a WORD doc and there is a unicode character called the currency sign '\u00A4', which is being used as some kind of paragraph break, in addition to the standard '\n' '\r' ect!

So i'm trying to figure out how to specify the logic to identify this UNICODE character!

Right now i'm using the following statement: if (c=='\n' || c=='\t' || c=='\u00A4') but it doesn't seem to recognize that UNICODE specification!

Thanks!
 
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator



My guess is probably not. Those characters are OS specific (\t is tab in Windows, but I'm not sure what represents tab in UNIX).



Rubbish.
Have a look at an ASCII table and the name of the character 0x09 (9).

[ July 27, 2004: Message edited by: Tony Morris ]
[ July 29, 2004: Message edited by: Tony Morris ]
 
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well - ascii is a standard over many platforms, and is the same for unix and windows for 0-127 (7bit).
And of course \t was a unix-tab when dos wasn't invented.

But \u00A4 which is 164(dec) is outside off the standard, and not a whitespace - though perhaps invisible in ordinary editors.

164 is at least a 8bit-character in the extended ascii charset.

Java-characters are 16 bit, and \u00A4 is a 16-bit notation too.

Perhaps you may use a hex-editor, to find out the position, where the � is printed, and try to find out, what java is reading.
Perhaps you have to tell the InputStream, which encoding to use?
Or ask, which encoding it is actually using?

But I don't know, which encoding word-docs use.
There is an apache - openSource - api available, to read Excel and Word docs - POI and H?? (poor obfuscating interface/ horrible ... ...).
 
reply
    Bookmark Topic Watch Topic
  • New Topic