aspose file tools*
The moose likes Java in General and the fly likes How to write a regex to include a text but exclude another text in the one regex? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "How to write a regex to include a text but exclude another text in the one regex?" Watch "How to write a regex to include a text but exclude another text in the one regex?" New topic
Author

How to write a regex to include a text but exclude another text in the one regex?

pkinuk Buler
Ranch Hand

Joined: May 22, 2009
Posts: 63
Hi all,

I've gone through a lots of examples in JavaRanch/Other websites, but I still can't write a regex to finish : include a text but exclude another text in the one regex

I have an example: haha.hello.common.exceptions.SchemaValidationException: Validated XML message - message invalid. [Error Code cegst01_350]\r\n\n\tat haha.hello.common.util.XMLUtility.validateMessage(XMLUtility.java:257)\n\tat haha.hello.mama.papa.release2.socketxml.readers.InputReaderBaseImpl.a(InputReaderBaseImpl.java:7)\n\tat haha.hello.mama.papa.release2.socketxml.readers.InputReaderBaseImpl.<init>(InputReaderBaseImpl.java:1)\n\tat haha.hello.mama.papa.release2.socketxml.readers.InputReaderWithSSODataImpl.<init>(InputReaderWithSSODataImpl.java:11)\n\tat haha.hello.mama.papa.release2.socketxml.readers.GetRolesReaderImpl.<init>(GetRolesReaderImpl.java:4)\n\tat haha.hello.mama.papa.release2.socketxml.CoreInputReaderFactoryImpl.newInputReader(CoreInputReaderFactoryImpl.java:85)\n\tat haha.hello.mama.papa.release2.socketxml.CoreInputReaderFactoryImpl.newInputReader(CoreInputReaderFactoryImpl.java:14)\n\tat haha.hello.mama.papa.release2.socketxml.CoreInputReaderFactoryImpl.newInputReader(CoreInputReaderFactoryImpl.java:52)\n\tat haha.hello.mama.papa.socket.protocols.NewlineDelimitedAPIBridge.process(NewlineDelimitedAPIBridge.java:108)\n\tat haha.hello.mama.papa.socket.protocols.NewlineDelimitedAPIBridge.process(NewlineDelimitedAPIBridge.java:20)\n\tat haha.hello.mama.papa.socket.SocketConnection.d(SocketConnection.java:29)\n\tat haha.hello.mama.papa.socket.SocketConnection.run(SocketConnection.java:172)\n\tat haha.hello.mama.papa.socket.SocketThreadManager$SocketThread.run(SocketThreadManager.java:7)\nCaused by: org.xml.sax.SAXParseException: cvc-minLength-valid: Value '' with length = '0' is not facet-valid with respect to minLength '1' for type 'NonEmptyString'.\n\tat com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)\n\tat com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.error(ErrorHandlerWrapper.java:131)\n\tat com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:384)\n\tat com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:318)\n\tat com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator$XSIErrorReporter.reportError(XMLSchemaValidator.java:410)\n\tat com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.reportSchemaError(XMLSchemaValidator.java:3165)\n\tat com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.elementLocallyValidType(XMLSchemaValidator.java:3068)\n\tat com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.processElementContent(XMLSchemaValidator.java:2978)\n\tat com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleEndElement(XMLSchemaValidator.java:2121)\n\tat com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.endElement(XMLSchemaValidator.java:791)\n\tat com.sun.org.apache.xerces.internal.jaxp.validation.DOMValidatorHelper.finishNode(DOMValidatorHelper.java:338)\n\tat com.sun.org.apache.xerces.internal.jaxp.validation.DOMValidatorHelper.validate(DOMValidatorHelper.java:243)\n\tat com.sun.org.apache.xerces.internal.jaxp.validation.DOMValidatorHelper.validate(DOMValidatorHelper.java:186)\n\tat com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:100)\n\tat javax.xml.validation.Validator.validate(Validator.java:127)\n\tat haha.hello.common.util.XMLUtility.validateMessage(XMLUtility.java:8)

the String is a little bit too long, but it was an log message. What I planned to do in one regex is:
1. Check if the text contains
2. Make sure the text doesn't contain

The Matcher.find() will return true only if the text fulfill above conditions.

Can anyone help me to write the regex?

Thank you in advance
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19060
    
  40

pkinuk Buler wrote:Hi all,

I've gone through a lots of examples in JavaRanch/Other websites, but I still can't write a regex to finish : include a text but exclude another text in the one regex

I have an example: [tt]haha.hello.common.exceptions.SchemaValidationException: Validated XML message - message invalid. [Error Code cegst01_350]\r\n\n\tat haha.hello.common.util.XMLUtility.validateMessage(XMLUtility.java:257)\n\tat haha.hello.mama.papa.release2.socketxml.readers.InputReaderBaseImpl.a(InputReaderBaseImpl.java:7)\n\tat haha.hello.mama.papa.release2.socketxml.readers.InputReaderBaseImpl.<init>(InputReaderBaseImpl.java:1)\n\tat the String is a little bit too long, but it was an log message. What I planned to do in one regex is:
1. Check if the text contains
2. Make sure the text doesn't contain

The Matcher.find() will return true only if the text fulfill above conditions.

Can anyone help me to write the regex?

Thank you in advance


The easiest way to do this is to have a negative look-ahead (generally from the beginning of the regex) attached to the regex search for the item that you want. With the negative look-ahead, the regex will always fail, if it finds the component that it doesn't want.

Using look aheads is pretty advanced, so I suggest that you start there, but don't be surprised if you run into trouble and have to backtrack to learn other parts of regex first.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
pkinuk Buler
Ranch Hand

Joined: May 22, 2009
Posts: 63
Henry Wong wrote:
pkinuk Buler wrote:Hi all,

I've gone through a lots of examples in JavaRanch/Other websites, but I still can't write a regex to finish : include a text but exclude another text in the one regex

I have an example: [tt]haha.hello.common.exceptions.SchemaValidationException: Validated XML message - message invalid. [Error Code cegst01_350]\r\n\n\tat haha.hello.common.util.XMLUtility.validateMessage(XMLUtility.java:257)\n\tat haha.hello.mama.papa.release2.socketxml.readers.InputReaderBaseImpl.a(InputReaderBaseImpl.java:7)\n\tat haha.hello.mama.papa.release2.socketxml.readers.InputReaderBaseImpl.<init>(InputReaderBaseImpl.java:1)\n\tat the String is a little bit too long, but it was an log message. What I planned to do in one regex is:
1. Check if the text contains
2. Make sure the text doesn't contain

The Matcher.find() will return true only if the text fulfill above conditions.

Can anyone help me to write the regex?

Thank you in advance


The easiest way to do this is to have a negative look-ahead (generally from the beginning of the regex) attached to the regex search for the item that you want. With the negative look-ahead, the regex will always fail, if it finds the component that it doesn't want.

Using look aheads is pretty advanced, so I suggest that you start there, but don't be surprised if you run into trouble and have to backtrack to learn other parts of regex first.

Henry



Hi Henry, thank you for your quick reply. I did apply the negative look-ahead, but it failed, the Matcher.find() returned true. The following is my code:

The regex i used is
SchemaValidationException(?(?!\QCaused by: org.xml.sax.SAXParseException: cvc-minLength-valid: Value '' with length = '0' is not facet-valid with respect to minLength '1' for type 'NonEmptyString'.\E).)*(?:\s)*)*


However, the Matcher.find() returned true, while the Matcher.group() is
SchemaValidationException: Validated XML message - message invalid. [Error Code cegst01_350]\r\n\n\tat com.qxlva.common.util.XMLUtility.validateMessage(XMLUtility.java:257)\n\tat com.qxlva.nhs.api.release2.socketxml.readers.InputReaderBaseImpl.a(InputReaderBaseImpl.java:7)\n\tat com.qxlva.nhs.api.release2.socketxml.readers.InputReaderBaseImpl.<init>(InputReaderBaseImpl.java:1)\n\tat com.qxlva.nhs.api.release2.socketxml.readers.InputReaderWithSSODataImpl.<init>(InputReaderWithSSODataImpl.java:11)\n\tat com.qxlva.nhs.api.release2.socketxml.readers.GetRolesReaderImpl.<init>(GetRolesReaderImpl.java:4)\n\tat com.qxlva.nhs.api.release2.socketxml.CoreInputReaderFactoryImpl.newInputReader(CoreInputReaderFactoryImpl.java:85)\n\tat com.qxlva.nhs.api.release2.socketxml.CoreInputReaderFactoryImpl.newInputReader(CoreInputReaderFactoryImpl.java:14)\n\tat com.qxlva.nhs.api.release2.socketxml.CoreInputReaderFactoryImpl.newInputReader(CoreInputReaderFactoryImpl.java:52)\n\tat com.qxlva.nhs.api.socket.protocols.NewlineDelimitedAPIBridge.process(NewlineDelimitedAPIBridge.java:108)\n\tat com.qxlva.nhs.api.socket.protocols.NewlineDelimitedAPIBridge.process(NewlineDelimitedAPIBridge.java:20)\n\tat com.qxlva.nhs.api.socket.SocketConnection.d(SocketConnection.java:29)\n\tat com.qxlva.nhs.api.socket.SocketConnection.run(SocketConnection.java:172)\n\tat com.qxlva.nhs.api.socket.SocketThreadManager$SocketThread.run(SocketThreadManager.java:7)\n


The java code is:


Would anybody tell me how to fix it?

Thank you in advance
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8404
    
  23

pkinuk Buler wrote:Can anyone help me to write the regex?

Yes. DON'T.

Regexes were designed for (originally) string patterns contained in a single line and, although they've been expanded to include multi-line matches, I'd suggest that whatever regex you come up with is likely to be unwieldly.

Seems to me that you are searching for two patterns in a multi-line block, so my pseudo-code would look something like:but I'm quite sure there are other solutions.

Winston

[Edit] @pkinuk: Above is not quite right. You shouldn't read a new line in the outer loop if a match on the 1st string was already found; I leave it to you to correct.


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Victor M. Pereira
Ranch Hand

Joined: Mar 02, 2012
Posts: 50
I'm not sure if I fully understand what you want, but it seems that your regex is missing a couple of things.

First, the regex you wrote is telling me that the String must begin with "SchemaValidationException" or it must not have Caused By ...

I believe that you are seeking something like: ([a-Z]|[0-9])*(SchemaValidationException|!(Caused by: org.xml.sax.SAXParseException: cvc-minLength-valid: Value '' with length = '0' is not facet-valid with respect to minLength '1' for type 'NonEmptyString'.))+([a-Z]|[0-9])*

I recommend trimming the String since spaces complicate the regex and one missing space might ruined your method.

That would be with regex however from your test case seems that indexOf would be easier to apply.



regards,
Victor M. Pereira
pkinuk Buler
Ranch Hand

Joined: May 22, 2009
Posts: 63
Winston Gutkowski wrote:
pkinuk Buler wrote:Can anyone help me to write the regex?

Yes. DON'T.

Regexes were designed for (originally) string patterns contained in a single line and, although they've been expanded to include multi-line matches, I'd suggest that whatever regex you come up with is likely to be unwieldly.

Seems to me that you are searching for two patterns in a multi-line block, so my pseudo-code would look something like:but I'm quite sure there are other solutions.

Winston

[Edit] @pkinuk: Above is not quite right. You shouldn't read a new line in the outer loop if a match on the 1st string was already found; I leave it to you to correct.


thank you for your reply, I wished I could use the yr pseudo-code do check the string. It was one of the requirements from our client: we have to use one regex to check the following conditions:

1. Check if the string contains ‘A’
2. If above condition return true, then check if the string doesn't contain the 'B'

pkinuk Buler
Ranch Hand

Joined: May 22, 2009
Posts: 63
Victor M. Pereira wrote:I'm not sure if I fully understand what you want, but it seems that your regex is missing a couple of things.

First, the regex you wrote is telling me that the String must begin with "SchemaValidationException" or it must not have Caused By ...

I believe that you are seeking something like: ([a-Z]|[0-9])*(SchemaValidationException|!(Caused by: org.xml.sax.SAXParseException: cvc-minLength-valid: Value '' with length = '0' is not facet-valid with respect to minLength '1' for type 'NonEmptyString'.))+([a-Z]|[0-9])*

I recommend trimming the String since spaces complicate the regex and one missing space might ruined your method.

That would be with regex however from your test case seems that indexOf would be easier to apply.



Thank you for your reply. As I said I had to use one regex to finish the following checking:

1. Check if the string contains A
2. If the string contains A then check if the string doesn't contain B

I've simpified the example. Hopefully someone can fixed the problem.

Thank you in advance

Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

It took me about a minute on google to find exactly what you seem to be looking for. It was in the first link that was returned.



The above will return true if the string contains X but not Y.

Is that what you were asking about, or have I misunderstood something?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to write a regex to include a text but exclude another text in the one regex?