Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regular expressions

 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34422
347
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to match an XML snippet.

I want to get:
<module><java>Client.jar</java></module>
out of something like this:
<module><java>Other.jar</java></module><module><java>Client.jar</java></module><module><java>Other.jar</java></module>

I tried this regular expression:
regexp="(<module(.)*?My_Client.jar(.)*?module>) "

But that gives me this:
<module><java>Other.jar</java></module><module><java>Client.jar</java></module>

So the ending is good in that the reluctant quantifier stops after the first </module> is found. But what do I do at the beginning so it starts with the <module> before the client jar?

Note: This is from Ant regular expressions, but it should be similar enough to Java's.

[edited to disable smilies]
[ September 15, 2005: Message edited by: Jeanne Boyarsky ]
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there some reason why you can't just do this:
BTW, never use (.)* in a regex; it's incredibly inefficient and doesn't do anything useful. Either put the asterisk inside the parens or get rid of the parens.
 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"<module>[^(module)]*?Client.jar.*?</module>"

EDIT: Remove the Client.jar to get something more generic that you can use in a Pattern to find each one and not just one with a Client.jar in it.
[ September 15, 2005: Message edited by: Ken Blair ]
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34422
347
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Alan,
Thanks for the note about the redundant parens. I can't do something simple like
because that is a simplified version of what I am trying to match. The module tag has an attribute whose value is unknown (to the code running the regular expression.)

Ken,
Thanks for the lead. It didn't work as is in Ant, but the language might be slightly different. The final regular expression that did work is:


It looks like the key is doing a greedy match before the expression.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34422
347
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ken Blair:
EDIT: Remove the Client.jar to get something more generic that you can use in a Pattern to find each one and not just one with a Client.jar in it.

Definitely! Luckily, I know how to do that part. I just didn't want to complicate the question with it. (That and it involves Ant variables and wouldn't belong in JiG.)
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's a generic way to match a single XML element:Just replace "module" with the name of the tag you want to find (in Ant, you should be able to replace it with a variable). If the element can be nested within itself, this will only find the innermost one.

Ken, I'm sorry to say your suggestion won't work in any regex flavor. The square brackets define a character class, which only matches one character. The initial caret means "complement of", so [^(module)] matches any single character that is not one of '(', 'm', 'o', 'd', 'u', 'l', 'e', or ')'.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34422
347
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Alan. I'll compare my working one with yours to increase my understanding.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic