aspose file tools*
The moose likes Java in General and the fly likes regular expressions Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "regular expressions" Watch "regular expressions" New topic
Author

regular expressions

Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 29233
    
136

I'm trying to match an XML snippet.

I want to get:
<module><java>Client.jar</java></module>
out of something like this:
<module><java>Other.jar</java></module><module><java>Client.jar</java></module><module><java>Other.jar</java></module>

I tried this regular expression:
regexp="(<module(.)*?My_Client.jar(.)*?module>) "

But that gives me this:
<module><java>Other.jar</java></module><module><java>Client.jar</java></module>

So the ending is good in that the reluctant quantifier stops after the first </module> is found. But what do I do at the beginning so it starts with the <module> before the client jar?

Note: This is from Ant regular expressions, but it should be similar enough to Java's.

[edited to disable smilies]
[ September 15, 2005: Message edited by: Jeanne Boyarsky ]

[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
Is there some reason why you can't just do this:
BTW, never use (.)* in a regex; it's incredibly inefficient and doesn't do anything useful. Either put the asterisk inside the parens or get rid of the parens.
Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
"<module>[^(module)]*?Client.jar.*?</module>"

EDIT: Remove the Client.jar to get something more generic that you can use in a Pattern to find each one and not just one with a Client.jar in it.
[ September 15, 2005: Message edited by: Ken Blair ]
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 29233
    
136

Alan,
Thanks for the note about the redundant parens. I can't do something simple like
because that is a simplified version of what I am trying to match. The module tag has an attribute whose value is unknown (to the code running the regular expression.)

Ken,
Thanks for the lead. It didn't work as is in Ant, but the language might be slightly different. The final regular expression that did work is:


It looks like the key is doing a greedy match before the expression.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 29233
    
136

Originally posted by Ken Blair:
EDIT: Remove the Client.jar to get something more generic that you can use in a Pattern to find each one and not just one with a Client.jar in it.

Definitely! Luckily, I know how to do that part. I just didn't want to complicate the question with it. (That and it involves Ant variables and wouldn't belong in JiG.)
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
Here's a generic way to match a single XML element:Just replace "module" with the name of the tag you want to find (in Ant, you should be able to replace it with a variable). If the element can be nested within itself, this will only find the innermost one.

Ken, I'm sorry to say your suggestion won't work in any regex flavor. The square brackets define a character class, which only matches one character. The initial caret means "complement of", so [^(module)] matches any single character that is not one of '(', 'm', 'o', 'd', 'u', 'l', 'e', or ')'.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 29233
    
136

Thanks Alan. I'll compare my working one with yours to increase my understanding.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: regular expressions
 
Similar Threads
Database connection
Loading EAR files in Java EE 5
Problem with replaceAll method
Remove multiple occurences of XML nodes
1.5