| Author |
Finding multiple strings in a string - regex help
|
Mike Bates
Ranch Hand
Joined: Sep 19, 2009
Posts: 81
|
|
I have a file I am reading into a string since it contains multiple line returns. The file is a basic php file and all I want from it are the name value pairs.
textFile:
$test1 = "test1"; // this is test 1
$test2 = "test2"; // this is test 2
$test3 = "test 3 with new line
new line
newline"; // test 3 end
When loaded in a string it looks like:
$test1 = "test1"; // this is test 1$test2 = "test2"; // this is test 2 $test3 = "test 3 with new line new line newline"; // test 3 end
My thought was to capture everything between the $ and the semi-colon (;) for an enumeration by splitting on the equal sign so I can use the data downstream.
So far I have for testing
Pattern pattern = Pattern.compile("\\$*\\;");
Matcher matcher = pattern.matcher(textFile);
But all I get is just the semi-colon. As you can tell, I am new to both Java and RegExs. Any thoughts on approach? I may be totally messed up by putting the file into a string as well.
Thanks
Mike
|
 |
Mike Bates
Ranch Hand
Joined: Sep 19, 2009
Posts: 81
|
|
As an update I found the regex "\\$([^\\;]*)\\;" works but still provides the $ and the semi-colon which would like dropped. I'll keep playing with it. Still any input you have would be great.
Mike
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32675
|
|
Welcome to JavaRanch
If you are sure your regex is giving you the correct lines, and you are also sure you only want to lose one character ($) at the start and one character at the end (;) what about substring()? (The trim method of the Strimg class may help, too.)
|
 |
Ankit Garg
Saloon Keeper
Joined: Aug 03, 2008
Posts: 9189
|
|
looking at your original regular expression "\\$*\\;" here you are saying that the string can start from any number of $ characters. The * is applied on $. If you want it to mean $, then any number of characters, then ;, then use "\\$.*;" (I'm not sure why you are using \\ with ;, it doesn't looks necessary to me. But this regex will work as a greedy quantifier, so it will match the first $ and the last ; that it can find. Go here to know about quantifiers and try to find a solution for this.
Coming to the second regex that you are using, it is similar to the one that I showed, you are using [^\\;]* which means any number of occurrences of a character which is not a semicolon. You could've used . instead of this, but this regex keeps you safe from the greedy problem as regex is not able to match the first $ and last ;, as any semicolon in the middle of the input will result in the end of the match.
As far as excluding the $ and ; from the match goes, I don't know of any way of doing that, but you can always use the substring method to remove them...
[Argh, Campbell beat me ]
|
SCJP 6 | SCWCD 5 | Javaranch SCJP FAQ | SCWCD Links
|
 |
Mike Bates
Ranch Hand
Joined: Sep 19, 2009
Posts: 81
|
|
Well, I figure out the regex the double slash is required to see the semi-colon with the quotes. Also, one key item I was not doing was looking at matcher.group(1) for the actual value returned. Once in a loop, I was able to parse the entire string, then I realized some of the values had single quotes and others had double quotes so I used the following regex to dump the values to group(1) and group(2) -->
This works pretty well. Now to learn some new stuff -- Enumerations.
Thanks
Mike
|
 |
 |
|
|
subject: Finding multiple strings in a string - regex help
|
|
|