I have a String with pattern KEYWORD ARG1="x" ARG2="test test" that I need to tokenize. I tried using Pattern. It gives first 2 groups and gives exception after that. Any help is appreciated. Thanks.
I get this output:
Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 3
Your pattern only has two capturing groups. You can't capture something that isn't defined in your regex pattern.
Or another way to look at it. The group number is determined by where it is in the pattern. It is not determined by the order that it is matched. There could be a thousand ARGs in your string, and group 2 will only contain the last one.
I wouldn't advise you to limit yourself to "elegant" solutions. Really, you're looking for something that works.
And make sure you have the correct specs for the strings you're trying to parse. We only have one example of a valid string, which isn't nearly enough to start writing code for. For example: The regex which you tried in your original post restricts the "KEYWORD" part to being upper-case Latin letters only. Is that really the spec? You can't have "TOTAL2014INCOME" as a keyword, for example? Or "TotalIncome"? Same goes for the other parts -- in other specs (like XML for example) where you have attribute/value pairs and the value is delimited by quotes, there's often a feature where the value can contain a quote itself, so there's an escape character (or some other tool) to prevent that quote from being used as a delimiter. Does your input not have something like that? And does it matter if there are extra spaces here and there? Like two spaces (or a tab character) between the keyword and the first attribute, or between the attribute name and the "=" character which follows it? Make sure you have a good understanding of the spec before you start writing code -- if you look at the code for an XML parser, for example, you'll be looking at something which could never be described as "elegant".