Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Finding multiple strings in a string - regex help

 
Mike Bates
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a file I am reading into a string since it contains multiple line returns. The file is a basic php file and all I want from it are the name value pairs.

textFile:
$test1 = "test1"; // this is test 1
$test2 = "test2"; // this is test 2
$test3 = "test 3 with new line
new line
newline"; // test 3 end

When loaded in a string it looks like:

$test1 = "test1"; // this is test 1$test2 = "test2"; // this is test 2 $test3 = "test 3 with new line new line newline"; // test 3 end

My thought was to capture everything between the $ and the semi-colon (;) for an enumeration by splitting on the equal sign so I can use the data downstream.

So far I have for testing
Pattern pattern = Pattern.compile("\\$*\\;");
Matcher matcher = pattern.matcher(textFile);

But all I get is just the semi-colon. As you can tell, I am new to both Java and RegExs. Any thoughts on approach? I may be totally messed up by putting the file into a string as well.

Thanks
Mike
 
Mike Bates
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As an update I found the regex "\\$([^\\;]*)\\;" works but still provides the $ and the semi-colon which would like dropped. I'll keep playing with it. Still any input you have would be great.

Mike
 
Campbell Ritchie
Sheriff
Pie
Posts: 48968
60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch

If you are sure your regex is giving you the correct lines, and you are also sure you only want to lose one character ($) at the start and one character at the end (;) what about substring()? (The trim method of the Strimg class may help, too.)
 
Ankit Garg
Sheriff
Posts: 9521
22
Android Google Web Toolkit Hibernate IntelliJ IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
looking at your original regular expression "\\$*\\;" here you are saying that the string can start from any number of $ characters. The * is applied on $. If you want it to mean $, then any number of characters, then ;, then use "\\$.*;" (I'm not sure why you are using \\ with ;, it doesn't looks necessary to me. But this regex will work as a greedy quantifier, so it will match the first $ and the last ; that it can find. Go here to know about quantifiers and try to find a solution for this.

Coming to the second regex that you are using, it is similar to the one that I showed, you are using [^\\;]* which means any number of occurrences of a character which is not a semicolon. You could've used . instead of this, but this regex keeps you safe from the greedy problem as regex is not able to match the first $ and last ;, as any semicolon in the middle of the input will result in the end of the match.

As far as excluding the $ and ; from the match goes, I don't know of any way of doing that, but you can always use the substring method to remove them...

[Argh, Campbell beat me ]
 
Mike Bates
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I figure out the regex the double slash is required to see the semi-colon with the quotes. Also, one key item I was not doing was looking at matcher.group(1) for the actual value returned. Once in a loop, I was able to parse the entire string, then I realized some of the values had single quotes and others had double quotes so I used the following regex to dump the values to group(1) and group(2) -->

This works pretty well. Now to learn some new stuff -- Enumerations.

Thanks
Mike
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic