[regex] what's the difference between "\s" and "\s+?"
The first regex will match one whitespace character. The second regex will reluctantly match one or more whitespace characters. For most purposes, these two regexes are very similar, except in the second case, the regex can match more of the string, if it prevents the regex match from failing.
Henry Wong wrote:The second regex will reluctantly match one or more whitespace characters. For most purposes, these two regexes are very similar, except in the second case, the regex can match more of the string, if it prevents the regex match from failing.
Good old regex, eh? Powerful, beautiful, and totally arcane to all but the few thousand that like it (or, like me, as a Sysadmin, had it thrust on them).
@Martin: That 'reluctant' qualifier (?) is worth knowing about because, by default, regex patterns are "greedy" (that is, they will match the largest pattern they can find). Unfortunately, it's also used to mean "0 or 1", so you need to be careful when you're interpreting them.
And just in case it comes up: regexes are powerful, but not omnipotent; and one particular thing they are NOT suited for is parsing tagged input like HTML/XML. If you ever find yourself needing to do it, use a proper SAX or DOM-based parser.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Matthew Brown wrote:They can give different results if they're part of a larger pattern. Consider the difference between "a\sb" and "a\s+?b" - the latter will allow any amount of white space between the "a" and "b".
Henry Wong wrote:As an example probably works best...
I'm not sure it does; otherwise how would "\\s+?" and "\\s+" be any different? (which, I think, is what Martin is trying to work out). Or indeed, why wouldn't you just use "\\s" instead?
@Martin: In answer to your previous question: off the top of my head, I can't; but Campbell's answer is how I understand it. You have to understand that regexes are a pattern-matching language, and are also constrained by the characters normally found on most keyboards (a bit of history for you). It's likely therefore that you may find some anomalies; but, in general, they're pretty good.
My advice: Learn the basics; leave the esoteric stuff to the anoraks (or a proper parser). Personally, I hate having to write docs for a regex that I just spent a couple of hours working out myself. Might as well make it a class .
I would challenge you to a battle of wits, but I see you are unarmed - shakespear. Unarmed tiny ad: