aspose file tools*
The moose likes Java in General and the fly likes Another regex problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Another regex problem" Watch "Another regex problem" New topic
Author

Another regex problem

Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
I need to match a string formatted as a number with optional comma separators. The number can be integral or floating point. So any of the following is valid:

This pattern works except that it also matches an empty string:

Any ideas as to how to eliminate the empty match?
Thanks in advance,
Michael Morris


Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius - and a lot of courage - to move in the opposite direction. - Ernst F. Schumacher
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
Assuming JDK 1.4, you can use a negative zero-width lookahead to eliminate the unwanted match:
(?!^$)(?:[0-9]{1,3}...
- Peter
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Thanks Peter. That did it. Could you explain the syntax there? Does it mean don't match beginning and end of string together?
Michael Morris
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
The ^$ syntax just matches an empty string (start of string ^ and end of string $ with nothing in between). The lookahead (?!) is "zero width", which means that a match doesn't consume the characters matched; they will be available for the remainder of the pattern to match. So my insertion (?!^$) does not change the way your pattern operates, the only thing it does is prevent an empty string from matching the pattern.
In addition to the negative lookahead (?!) which is successful if the characters do not match its pattern, there is a positive lookahead (?=) and both positive (?<=) and negative (?<!) look-behinds. All of these are zero-width, and great in cases where you want to match a pattern A except when it looks like B or a pattern A provided that it is preceded by B etc.
- Peter
[ March 26, 2003: Message edited by: Peter den Haan ]
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Yep, just noticed it in the API docs for Pattern. What is the difference between Greedy, Reluctant and Possessive quantifiers? I've always used the Greedy syntax.
Michael Morris
Jason Menard
Sheriff

Joined: Nov 09, 2000
Posts: 6450
Originally posted by Michael Morris:
Yep, just noticed it in the API docs for Pattern. What is the difference between Greedy, Reluctant and Possessive quantifiers? I've always used the Greedy syntax.
Michael Morris

This link explains it far better than I could. When reading that, the important part to key on is the concept of backtracking. Reluctant quantifiers are referred to in that text as lazy quantifiers.
It doesn't talk about possessive quantifiers specifically, but as you read the bit about greediness and backtracking, keep this in mind from the API:
Possessive quantifiers, which greedily match as much as they can and do not back off, even when doing so would allow the overall match to succeed.

HTH
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
If your text is "abcba abcba"
The greedy pattern "ab.*ba" will match the substring "abcba abcba" -- the largest substring that fits the pattern
The reluctant pattern "ab.*?ba" will match the substring "abcba" -- the first substring that fits the pattern
The possessive pattern "ab.*+ba" will not match at all, because the possessive .*+ will gobble up all of "cba abcba", including the closing "ba", and never let go of it again.
- Peter
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Thanks Jason and Peter. Peter's explaination is perfectly clear. Now I'll start seeing regex patterns from a much better point of view. The local BooksAMillion has O'Reilley's Mastering Regular Expressions on sale. I think I'll pick it up this weekend.
Thanks again,
Michael Morris
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Another regex problem