Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Another regex problem

 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need to match a string formatted as a number with optional comma separators. The number can be integral or floating point. So any of the following is valid:

This pattern works except that it also matches an empty string:

Any ideas as to how to eliminate the empty match?
Thanks in advance,
Michael Morris
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Assuming JDK 1.4, you can use a negative zero-width lookahead to eliminate the unwanted match:
(?!^$)(?:[0-9]{1,3}...
- Peter
 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Peter. That did it. Could you explain the syntax there? Does it mean don't match beginning and end of string together?
Michael Morris
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The ^$ syntax just matches an empty string (start of string ^ and end of string $ with nothing in between). The lookahead (?!) is "zero width", which means that a match doesn't consume the characters matched; they will be available for the remainder of the pattern to match. So my insertion (?!^$) does not change the way your pattern operates, the only thing it does is prevent an empty string from matching the pattern.
In addition to the negative lookahead (?!) which is successful if the characters do not match its pattern, there is a positive lookahead (?=) and both positive (?<=) and negative (?<!) look-behinds. All of these are zero-width, and great in cases where you want to match a pattern A except when it looks like B or a pattern A provided that it is preceded by B etc.
- Peter
[ March 26, 2003: Message edited by: Peter den Haan ]
 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yep, just noticed it in the API docs for Pattern. What is the difference between Greedy, Reluctant and Possessive quantifiers? I've always used the Greedy syntax.
Michael Morris
 
Jason Menard
Sheriff
Posts: 6450
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Michael Morris:
Yep, just noticed it in the API docs for Pattern. What is the difference between Greedy, Reluctant and Possessive quantifiers? I've always used the Greedy syntax.
Michael Morris

This link explains it far better than I could. When reading that, the important part to key on is the concept of backtracking. Reluctant quantifiers are referred to in that text as lazy quantifiers.
It doesn't talk about possessive quantifiers specifically, but as you read the bit about greediness and backtracking, keep this in mind from the API:
Possessive quantifiers, which greedily match as much as they can and do not back off, even when doing so would allow the overall match to succeed.

HTH
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If your text is "abcba abcba"
The greedy pattern "ab.*ba" will match the substring "abcba abcba" -- the largest substring that fits the pattern
The reluctant pattern "ab.*?ba" will match the substring "abcba" -- the first substring that fits the pattern
The possessive pattern "ab.*+ba" will not match at all, because the possessive .*+ will gobble up all of "cba abcba", including the closing "ba", and never let go of it again.
- Peter
 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Jason and Peter. Peter's explaination is perfectly clear. Now I'll start seeing regex patterns from a much better point of view. The local BooksAMillion has O'Reilley's Mastering Regular Expressions on sale. I think I'll pick it up this weekend.
Thanks again,
Michael Morris
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic