Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regular expression

 
Brian Buckley
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want a regular expression to split a String on whitespace except not when the whitespace is within brackets.

So for example "The cat [in the] hat".split(regularExpression) should evaluate to "The","cat","[in the]","hat".

If the regularExpression were "\\s+" it works for whitespace alone. How can I change this to not split when the whitespace is in brackets?

Tips welcome. Thanks!

Brian
 
Dirk Schreckmann
Sheriff
Posts: 7023
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Moving this to the Intermediate forum...
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Use a zero-width (negative) look-behind (?<!) to verify that the space isn't preceded by a "[" without balancing "]":

(?<!\[[^\]]*)\s+

Be warned that I didn't try this, but it should be close. Regular expressions, the most useful write-only medium in existence Don't forget to escape the backslashes if you put this in a String literal.

- Peter
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mmmmm, I don't think that will work well, Peter. (Welcome back, BTW!) I get:

Java.util.regex only does lookbehind if it can determine a maximum length to what it's looking for. Other regex packages may only do it if the expression has a single fixed length. Friedl discusses this, but I don't have my copy with me right now.

It may be possible to adapt this pattern for the problem at hand, but I don't see an easy way to do it. I think it will be easier to write a pattern to match the non-whitespace (or anything enclosed in braces) instead. E.g.:

[ June 05, 2004: Message edited by: Jim Yingst ]
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For the problem, as stated, the following should work.



However, it doesn't work with nested structures.

What we're saying here is the following. A space, that is followed any sequence of characters, so long as those characters are anything other then a open or closed bracket, but which end with a closed bracket


HTH,
M
 
Brian Buckley
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My god that works. Regular expressions freak me out.

I have to sit down study this...

Brian
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I hear some idiot recently wrote a book on regex & Java.

M
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And a good book on regex & Java to boot. What was his name again? Something that sounded a bit like a cool holiday destination. M... Max... Max... Oh, I don't know.

- Peter
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic