This week's book giveaway is in the OO, Patterns, UML and Refactoring forum.
We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line!
See this thread for details.
The moose likes Beginning Java and the fly likes Finding multiple strings in a string - regex help Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Finding multiple strings in a string - regex help" Watch "Finding multiple strings in a string - regex help" New topic

Finding multiple strings in a string - regex help

Mike Bates
Ranch Hand

Joined: Sep 19, 2009
Posts: 81
I have a file I am reading into a string since it contains multiple line returns. The file is a basic php file and all I want from it are the name value pairs.

$test1 = "test1"; // this is test 1
$test2 = "test2"; // this is test 2
$test3 = "test 3 with new line
new line
newline"; // test 3 end

When loaded in a string it looks like:

$test1 = "test1"; // this is test 1$test2 = "test2"; // this is test 2 $test3 = "test 3 with new line new line newline"; // test 3 end

My thought was to capture everything between the $ and the semi-colon (;) for an enumeration by splitting on the equal sign so I can use the data downstream.

So far I have for testing
Pattern pattern = Pattern.compile("\\$*\\;");
Matcher matcher = pattern.matcher(textFile);

But all I get is just the semi-colon. As you can tell, I am new to both Java and RegExs. Any thoughts on approach? I may be totally messed up by putting the file into a string as well.

Mike Bates
Ranch Hand

Joined: Sep 19, 2009
Posts: 81
As an update I found the regex "\\$([^\\;]*)\\;" works but still provides the $ and the semi-colon which would like dropped. I'll keep playing with it. Still any input you have would be great.

Campbell Ritchie

Joined: Oct 13, 2005
Posts: 43970
Welcome to JavaRanch

If you are sure your regex is giving you the correct lines, and you are also sure you only want to lose one character ($) at the start and one character at the end (;) what about substring()? (The trim method of the Strimg class may help, too.)
Ankit Garg

Joined: Aug 03, 2008
Posts: 9404

looking at your original regular expression "\\$*\\;" here you are saying that the string can start from any number of $ characters. The * is applied on $. If you want it to mean $, then any number of characters, then ;, then use "\\$.*;" (I'm not sure why you are using \\ with ;, it doesn't looks necessary to me. But this regex will work as a greedy quantifier, so it will match the first $ and the last ; that it can find. Go here to know about quantifiers and try to find a solution for this.

Coming to the second regex that you are using, it is similar to the one that I showed, you are using [^\\;]* which means any number of occurrences of a character which is not a semicolon. You could've used . instead of this, but this regex keeps you safe from the greedy problem as regex is not able to match the first $ and last ;, as any semicolon in the middle of the input will result in the end of the match.

As far as excluding the $ and ; from the match goes, I don't know of any way of doing that, but you can always use the substring method to remove them...

[Argh, Campbell beat me ]

SCJP 6 | SCWCD 5 | Javaranch SCJP FAQ | SCWCD Links
Mike Bates
Ranch Hand

Joined: Sep 19, 2009
Posts: 81
Well, I figure out the regex the double slash is required to see the semi-colon with the quotes. Also, one key item I was not doing was looking at for the actual value returned. Once in a loop, I was able to parse the entire string, then I realized some of the values had single quotes and others had double quotes so I used the following regex to dump the values to group(1) and group(2) -->

This works pretty well. Now to learn some new stuff -- Enumerations.

I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link:
subject: Finding multiple strings in a string - regex help
It's not a secret anymore!