Win a copy of React Cookbook: Recipes for Mastering the React Framework this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Rob Spoor
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Junilu Lacar
  • Tim Cooke
Saloon Keepers:
  • Tim Holloway
  • Piet Souris
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Frits Walraven
  • Himai Minh

[regex] what's the difference between "\s" and "\s+?"

 
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't know.
 
author
Posts: 23906
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

[regex] what's the difference between "\s" and "\s+?"




The first regex will match one whitespace character. The second regex will reluctantly match one or more whitespace characters. For most purposes, these two regexes are very similar, except in the second case, the regex can match more of the string, if it prevents the regex match from failing.

Henry
 
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

[regex] what's the difference between "\s" and "\s+?"




The first regex will match one whitespace character. The second regex will reluctantly match one or more whitespace characters.



<nitpick>

From http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html:


</nitpick>



For further reading:
http://www.regular-expressions.info/
http://docs.oracle.com/javase/tutorial/essential/regex/
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also, be aware that if you're writing that in a String literal in your Java source code, it will be


The Java compiler takes a backslash as an escape in a String literal, so you need two of them to tell it you want a literal backslash in the resulting String, so that the regex engine will see \s.
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:The second regex will reluctantly match one or more whitespace characters. For most purposes, these two regexes are very similar, except in the second case, the regex can match more of the string, if it prevents the regex match from failing.


Good old regex, eh? Powerful, beautiful, and totally arcane to all but the few thousand that like it (or, like me, as a Sysadmin, had it thrust on them).

@Martin: That 'reluctant' qualifier (?) is worth knowing about because, by default, regex patterns are "greedy" (that is, they will match the largest pattern they can find). Unfortunately, it's also used to mean "0 or 1", so you need to be careful when you're interpreting them.

And just in case it comes up: regexes are powerful, but not omnipotent; and one particular thing they are NOT suited for is parsing tagged input like HTML/XML. If you ever find yourself needing to do it, use a proper SAX or DOM-based parser.

Winston
 
Marshal
Posts: 73738
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote: . . . regexes are powerful, but not omnipotent; and one particular thing they are NOT suited for is parsing tagged input like HTML/XML. . . .

That is because HTML is a context-sensitive grammar, and regexes only work on regular grammars.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:That is because HTML is a context-sensitive grammar, and regexes only work on regular grammars.


Or patterns, because that's what regexes were designed for: pattern matching - and more specifically, in ASCII text.

Winston
 
Martin Garrido
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Could anyone write a paragraph where "\s" and "\s+?" don't match the same?

Thank you very much.

As you know, you can test it in http://regexpal.com/
 
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
They can give different results if they're part of a larger pattern. Consider the difference between "a\sb" and "a\s+?b" - the latter will allow any amount of white space between the "a" and "b".
 
Henry Wong
author
Posts: 23906
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Martin Garrido wrote:
Could anyone write a paragraph where "\s" and "\s+?" don't match the same?

Thank you very much.

As you know, you can test it in http://regexpal.com/



Matthew Brown wrote:They can give different results if they're part of a larger pattern. Consider the difference between "a\sb" and "a\s+?b" - the latter will allow any amount of white space between the "a" and "b".



As an example probably works best...



Henry
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:As an example probably works best...


I'm not sure it does; otherwise how would "\\s+?" and "\\s+" be any different? (which, I think, is what Martin is trying to work out). Or indeed, why wouldn't you just use "\\s" instead?

@Martin: In answer to your previous question: off the top of my head, I can't; but Campbell's answer is how I understand it. You have to understand that regexes are a pattern-matching language, and are also constrained by the characters normally found on most keyboards (a bit of history for you). It's likely therefore that you may find some anomalies; but, in general, they're pretty good.

My advice: Learn the basics; leave the esoteric stuff to the anoraks (or a proper parser). Personally, I hate having to write docs for a regex that I just spent a couple of hours working out myself. Might as well make it a class .

Winston
 
I would challenge you to a battle of wits, but I see you are unarmed - shakespear. Unarmed tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic