aspose file tools*
The moose likes Java in General and the fly likes Pattern matching problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Pattern matching problem" Watch "Pattern matching problem" New topic
Author

Pattern matching problem

pats shah
Greenhorn

Joined: Oct 28, 2009
Posts: 1
hi there

I have a string = "<a><b>qwer qwer</b></a><b>zxcv zcv</b>"
I want output as follows

<b>qwer qwer</b>
<b>zxcv zcv</b>


I tried following but the problem is i m getting output as <b>qwer qwer</b></a><b>zxcv zcv</b>

String newLine = System.getProperty("line.separator").toString();
String input = "<a><b>qwer qwer</b></a><b>zxcv zcv</b>";
String output = "";
String regex = "<b>.*</b>";
Pattern p1 = Pattern.compile(regex);
Matcher m1 = p1.matcher(input);
while (m1.find())
{
output += m1.group() + newLine;
}

//System.out.println("input = " + input);
System.out.println("output = "+output);


Can anyone suggest a solution for this ?

Basically because i m using .* so it goes on parsing the string and doesn't stop when it finds first match.

Can somebody tell how to do this ?

Thanks a lot in advance
Rahul P Kumar
Ranch Hand

Joined: Sep 26, 2009
Posts: 188
pats shah wrote:hi there

I have a string = "<a><b>qwer qwer</b></a><b>zxcv zcv</b>"
I want output as follows

<b>qwer qwer</b>
<b>zxcv zcv</b>


I tried following but the problem is i m getting output as <b>qwer qwer</b></a><b>zxcv zcv</b>

String newLine = System.getProperty("line.separator").toString();
String input = "<a><b>qwer qwer</b></a><b>zxcv zcv</b>";
String output = "";
String regex = "<b>.*</b>"; // try "<b>([a-z]*|\\s*)*</b>"
Pattern p1 = Pattern.compile(regex);
Matcher m1 = p1.matcher(input);
while (m1.find())
{
output += m1.group() + newLine;
}

//System.out.println("input = " + input);
System.out.println("output = "+output);


Can anyone suggest a solution for this ?

Basically because i m using .* so it goes on parsing the string and doesn't stop when it finds first match.

Can somebody tell how to do this ?

Thanks a lot in advance
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

The problem is that your regex where you find '0 or more characters' is too greedy, it is looking for everything it can get its hands on without breaking a match - which includes the intermediate tag. So this part of the regex:

matches all of this text:


Look at the Pattern javadocs to find a way to make it more reluctant to consume characters (ie, don't consume those characters if they can be used in another part of the matching pattern).


Steve
Siva Masilamani
Ranch Hand

Joined: Sep 19, 2008
Posts: 385
use the pattern like this "<b>.*?</b>"


SCJP 6,SCWCD 5,SCBCD 5

Failure is not an option.
Rahul P Kumar
Ranch Hand

Joined: Sep 26, 2009
Posts: 188
Siva Masilamani wrote:use the pattern like this "<b>.*?</b>"


Thanks! it was revealing
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39885
    
  28
And welcome to JavaRanch, Pats Shah
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Pattern matching problem