| Author |
Regular expressions: Grouping
|
H Paul
Ranch Hand
Joined: Jul 26, 2011
Posts: 299
|
|
Input: "XY 9999 A-C 24.12x9 blue,red"
Basically, I want to have 4 groups
XY 9999
A-C
24.12x9
blue,red
I wrote
When tested, I got the output as below (5 groups)
GW 1177 A-C 20.25x7 blue,red ====> This is what I do not want
GW 1177
A-C
20.25x7
blue,red
Not sure if I understand regex "group" concept correctly? What do I miss in terms of the pattern coded?
1M Thanks.
|
 |
Stephan van Hulst
Bartender
Joined: Sep 20, 2010
Posts: 3044
|
|
|
Have you read the documentation for groupCount() and group() carefully?
|
 |
H Paul
Ranch Hand
Joined: Jul 26, 2011
Posts: 299
|
|
public String group(int group)
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
groupCount
public int groupCount()
Returns the number of capturing groups in this matcher's pattern.
Group zero denotes the entire pattern by convention. It is not included in this count.
If I read the doc correctly, I should start my index from 1 (not from 0 as coded).
Is this correct?
|
 |
Wouter Oet
Saloon Keeper
Joined: Oct 25, 2008
Posts: 2700
|
|
|
Did it work when you tried it?
|
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
|
 |
H Paul
Ranch Hand
Joined: Jul 26, 2011
Posts: 299
|
|
For now, the index issue is yes.
But as a whole, I still have to see how the regular expression group work in general. For now, thanks.
|
 |
Maarten Bodewes
Greenhorn
Joined: Aug 04, 2011
Posts: 14
|
|
General remarks: using Matcher.match() and a regular expression that starts with ^ and ends with $ makes the code a bit less error prone - currently you may match strings that have spurious information.
|
 |
H Paul
Ranch Hand
Joined: Jul 26, 2011
Posts: 299
|
|
(a side note: will look into ^ and $)
Input: "XY 9999 A-C 24.12x9 blue,red"
I want the "entire" input string to match the pattern
pattern = firstGroup + space + secondGroup + space + thirdGroup + space + fourthGroup;
And I got what I wanted
GW 1177
A-C
20.25x7
blue,red
Now if I change
thirdGroup = "((\\d+| \\d+\\.\\d++)x(\\d+| \\d+\\.\\d++))"; // 99.9x12 or 9x10 for example
then I got nothing since matcher.matches() return false.
Syntax-wise, what is thirdGroup to be or what is missing? so that I got back the 4 groups.
1M Thanks.
|
 |
H Paul
Ranch Hand
Joined: Jul 26, 2011
Posts: 299
|
|
1. Syntax-wise, corrected. (no ++ and no space)
thirdGroup = "((\\d+|\\d+\\.\\d+)x(\\d+|\\d+\\.\\d+))";
2. Now I got 6 groups with thirdGroup broken down into 2 extra sub-groups.
GW 1177
A-C
20.25x7 === thirdGroup
20.25 === sub-group
7 === sub-group
blue,red
|
 |
Maarten Bodewes
Greenhorn
Joined: Aug 04, 2011
Posts: 14
|
|
Without looking into the reexp, I think you are on your way now. So I will give you some very important hints regarding regular expressions:
0) make sure there isn't already something that parses your input
1) don't make them too complex, you're better create a hierarchy, and mix parsing techniques - e.g first split things with String.split() if possible
2) describe them well, it may take a very long time to read a regexp, describe what you are trying to accomplish
3) create at least a couple of junit tests around them, with (at least) some corner cases, the expected good and possibly some bad scenarios
4) don't use them as technique to e.g. test ranges of numbers, dates etc., there are better tools for that, just test string input
5) remember that groups are *not* repetative (use Matcher.find() instead, or use other ways of repeating things from within the language (this trap caught me a few times)
6) learn to use non-capturing groups (you already got this one) and reluctant qualifiers
7) use findbugs to make sure your rexexps are at least valid at build time (findbugs can also check formatted strings)
8) use plugins for your favourite IDE that enable you to test regexps and their input in real time (extra points if they auto escape backslashes)
9) don't try to learn Pattern.html out of the top of your head, just the general techniques, that's what bookmarks and Google are for
Finally never forget that the Java regexp is brilliantly strong, and works on actual unicode strings - don't get disappointed if other languages don't give you that same robustness or flexibility.
|
 |
H Paul
Ranch Hand
Joined: Jul 26, 2011
Posts: 299
|
|
(I need time to digest the above/previous advice.)
Question:
I have a input string data as: XYZ 100 green low bowl
I try to capture into 2 groups as
XYZ 100 -- anything except lower case , that is a rule
green low bowl -- anything except upper case , that is a rule
Syntax-wise: Is there something missing? because the above code did not work for the case in question.
String FirstGroup ="([.&&[^a-z]]*)";
String SecondGroup ="([.&&[^A-Z]]*)";
|
 |
Maarten Bodewes
Greenhorn
Joined: Aug 04, 2011
Posts: 14
|
|
|
Yeah, sorry, that was maybe a bit much. Tip 8 though is a pretty useful one, since you can take your regexp one piece at a time. Probably in this case you are expecting the dot to match any character, but because it is in between square brackets, it's just a dot.
|
 |
H Paul
Ranch Hand
Joined: Jul 26, 2011
Posts: 299
|
|
Above code works.
Thousands of candles can be lit from a single candle, and the life of the candle will not be shortened. Happiness never decreases by being shared."
Thank-you for the candle # 8. Just downloaded Eclipse RegEx Plugin.
|
 |
 |
|
|
subject: Regular expressions: Grouping
|
|
|