aspose file tools
The moose likes Java in General and the fly likes regex Pattern class and spaces Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "regex Pattern class and spaces" Watch "regex Pattern class and spaces" New topic
Author

regex Pattern class and spaces

Rajagopal Manohar
Ranch Hand

Joined: Nov 26, 2004
Posts: 183
Hi,

I was using the regex package to format data read from a excel spread sheet. I wanted to collapse a set of continous white spaces to a single white space. so I used the Pattern "\\s{2,)" and replaced it with " ".

But I found that it worked only partially. Trying to debug I realised that the some of the data used some "no break space" in unicode with int value of the char being 160, which was missing in the \s pattern.

So I had to do some thing clumsy like
char space = 160;
pattern = "[\\s" + space + "]{2,1}";

Now what I cannot understand is why does the \s pattern class not include this space (char 160). And how do I know that tommorow if I try to read data from another file system in another platform I will not encounter a new space char. Does it not make my code platform dependant (otherwise \s should have handled all possible white spaces)

just a thought, I am sure there is a better explanation

regards,
Rajagopal
Stefan Wagner
Ranch Hand

Joined: Jun 02, 2003
Posts: 1923

Ascii(160) isn't allways a kind of space.
On Dos it is � AFAIK.


http://home.arcor.de/hirnstrom/bewerbung
Rajagopal Manohar
Ranch Hand

Joined: Nov 26, 2004
Posts: 183
Originally posted by Stefan Wagner:
Ascii(160) isn't allways a kind of space.
On Dos it is � AFAIK.


Does that mean that when I see a " " on screen on one platform save it and read it in another platform then I will see a "�". i'snt that a strange behaviour.

does java not promise platform independence? is there no way to guarantee
a common interpretation on all platforms

ps: forgive my ignorance but I thought in java every thing was converted to unicode. apparently i am wrong
[ May 16, 2005: Message edited by: Rajagopal Manohar ]
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
Yes, Java uses Unicode internally, so ASCII 160 will always be a non-breaking space as far as Java is concerned. To match it, just use the Unicode escape for the character:If you're normalizing the whitespace, shouldn't you also be converting single linefeeds, tabs, NBSP's, etc. into space characters?That is, any two or more consecutive whitespace characters, or any single whitespace character that isn't a space (ASCII 32).
Rajagopal Manohar
Ranch Hand

Joined: Nov 26, 2004
Posts: 183
If you're normalizing the whitespace, shouldn't you also be converting single linefeeds, tabs, NBSP's, etc. into space characters?


I guess yes.
Thanks
Rajagopal
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: regex Pattern class and spaces
 
Similar Threads
regex confusion.
modifying an input file based on pattern matching
urgent help--- fileformatting
Regular Expression: what is wrong here?
Regex help