Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Regex utilization

 
Ranch Hand
Posts: 93
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all,

I have no idea how to implement the following subject:

I can have a string subject like "[Ref ticket: $$token$$]"
I would like to check if several string provided matche previous string pattern.
After that for each string which matche the pattern, I would like to extract the $$token$$ which can be a number and character sequence like:

aazAZD2117az1D
ad1eQXN
4785ZA12

knowing I can choose/switch different pattern, example instead "[Ref ticket: $$token$$]", "[Reply to Ref: $$token$$ for ticket]" and also $$token$$ can be change ...

I dig my head to lay a code, but I have no idea, maybe do you have a idea ...


Great thank and best regards.

Adrien
 
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think I understand your requirement up to the point where you say "and also $$token$$ can be change". Do you mean that "$$token$$" may also be something like "%%token%%" or something like "%$token$%" where the "token" can be of a similar form to the 3 items given in your list?
 
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Adrien Ruffie wrote:knowing I can choose/switch different pattern, example instead "[Ref ticket: $$token$$]", "[Reply to Ref: $$token$$ for ticket]" and also $$token$$ can be change ...


This is one of the problems I see with RegEx (and it has nothing to do with RegEx). People want one pattern to match many different situations. I would suggest not doing that. You will find it a lot easier if you come up with a set of limited RegEx patterns (one for [Ref ticket: ...] another for [Reply to Ref: ... for ticket], etc...). Then use the patterns to see which pattern is matched, and then extract the pertinent part.

So don't look for the god-regex, instead look for a simple regex for each individual pattern / situation.
 
Richard Tookey
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Steve Luke wrote:
So don't look for the god-regex, instead look for a simple regex for each individual pattern / situation.



Though to my mind there is some uncertainty about what the OP means by "and also $$token$$ can be change" the probable 'or' nature of the parts mean that a single fairly straightforwards regex is likely to be possible. Of course I could be way out in my understanding so ...

The main problem I see with people specifying a requirement in this manner is that they don't actually specify anything. They give a couple of examples and expect one to deduce the general requirement from the examples. Often I can get most of the requirement from the examples (I think I can here) but I have had some spectacular failures. We will see.
 
Adrien Ruffie
Ranch Hand
Posts: 93
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok let's go, Steve Luke you say

People want one pattern to match many different situations. I would suggest not doing that. You will find it a lot easier if you come up with a set of limited RegEx patterns (one for [Ref ticket: ...] another for [Reply to Ref: ... for ticket], etc...)



It's not my decision, is just the project specification ... and no I can limite regex patterns sorry.

Richard for the token part, is a variable whiting the first sentence which client can specify in order to match with its needs (If client would like to change the part which must be extract, it should look like a ticket number for example)

[Ref ticket: KFR12C] or [Reply to Ref: 7784azD for ticket]

but the problem is the client can use another style of part, like lowercase, uppercase, combined, number, letter, variable size ... but I want to provide a mechanism to recover this part whatever the structure of the part, and this extraction must be done only if for example in a mail subject the first sentence was find like:

If client was to keep only mail received with following subject:

[Ref ticket: KFR12C]

This mail subject must be rejected [Referential ticket: KFR12C]

but [Ref ticket: 478ABC], [Ref ticket: AB14CA], [Ref ticket: 74AER4]



But client can change after for [Referential ticket: KFR12C] and in this case [Ref ticket: 478ABC], [Ref ticket: AB14CA], [Ref ticket: 74AER4] should not matche ...
And also extracted part can be specified by client example: AAbbCC, 777ABC, a7b8P9, azdaz54891azazx22

like [0-9]{ 4}[a-zA-Z]{3}[0-3]{1} ...

I know is not very simple ...

If you have an idea, I take it
 
Richard Tookey
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry but you have lost me. My original view of the requirement was obviously very wrong and this is one of the occasions when I cannot get even close to the actual requirements from a number of examples. You seem to be saying that the pattern to match can change at the whim of the client so in my opinion it's going to be impossible to write a single regex.

If you could specify, and I mean specify in detail not just a few examples, the general format and what has to be extracted then it may be possible to parameterize the regex in some way but as it stands I can't help. BNF would be the preferred form to present the specification but railroad diagrams might be easier to start with if you are not familiar with BNF.
 
Bartender
Posts: 10777
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Adrien Ruffie wrote:It's not my decision, is just the project specification ... and no I can limite regex patterns sorry.


Then unless it's for a classroom project specifically designed to increase your understanding of regexes, it's a BAD specification.

The reason? - Because it's telling you how to accomplish the task, rather than what needs to be done.

And even if it's so, there is still a lot you can take from Steve's advice:
  • break down the string into simple patterns to perform the individual pieces of the process.
  • concatenate them together - maybe surrounded by brackets to create groups - to get your full regex.

  • Winston
     
    Adrien Ruffie
    Ranch Hand
    Posts: 93
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Yes good idea I haven't correctly understand thank you, but another problem can appears ... for example:



    Doesn't work because the regular expression take in account that the last part is the closed bracket ...
     
    Richard Tookey
    Bartender
    Posts: 1166
    17
    Netbeans IDE Java Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Adrien Ruffie wrote:
    Doesn't work because the regular expression take in account that the last part is the closed bracket ...



    Also doesn't work because you are using matcher.group() before matcher.find() or matcher.matches() !

    P.S. I'm still very lost as to your requirement.
     
    Adrien Ruffie
    Ranch Hand
    Posts: 93
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Yes I forgotten this line ...

    Example I have following code:


    And finnaly I need to extract "AX3Q1XV3" of the first string but only if the subject contains following 2 parts: "[ref demande : " and "]"
     
    Richard Tookey
    Bartender
    Posts: 1166
    17
    Netbeans IDE Java Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    If my understanding is correct then the following totally unmaintainable regex solution should meet your requirement -


    Given some more time I can probably make this a little more maintainable BUT BUT BUT never really maintainable and even though I am a great fan of regex I would probably not use regex for this.
     
    Adrien Ruffie
    Ranch Hand
    Posts: 93
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Great thank, it work correctly :-)

    But I'am interesting by your advices:

    And even if it's so, there is still a lot you can take from Steve's advice:
    break down the string into simple patterns to perform the individual pieces of the process.
    concatenate them together - maybe surrounded by brackets to create groups - to get your full regex.

    but isn't the solution you give me in the last code snippet ?

    Because I think it done correctly my needs:
    1] check if subject like maybe one provided
    2] If subject as maybe same, extract the token
     
    Richard Tookey
    Bartender
    Posts: 1166
    17
    Netbeans IDE Java Linux
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Adrien Ruffie wrote:Great thank, it work correctly :-)

    But I'am interesting by your advices:

    And even if it's so, there is still a lot you can take from Steve's advice:
    break down the string into simple patterns to perform the individual pieces of the process.
    concatenate them together - maybe surrounded by brackets to create groups - to get your full regex.



    That is exactly what the replaceFirst() does though not explicitly !


    but isn't the solution you give me in the last code snippet ?

    Because I think it done correctly my needs:
    1] check if subject like maybe one provided
    2] If subject as maybe same, extract the token



    Let us assume that someone makes a small change to the requirement that requires a change to that code. Could you do it? If not then the solution is unmaintainable by you. Are you going to come back to this forum every time a small change is needed? Also, no checks are done to make sure the subject has the right syntax. It is easy to add the checks but could you do it?
     
    Winston Gutkowski
    Bartender
    Posts: 10777
    71
    Hibernate Eclipse IDE Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Adrien Ruffie wrote:Example I have following code:


    Adrien,

    Please DontWriteLongLines. It makes your thread very hard to read. I've broken yours up this time, but for future reference, please remember:
    80 characters max.
    (the SSCCE page actually recommends 62)
    And that includes string literals AND comments.

    Thanks.

    Winston
     
    Steve Luke
    Bartender
    Posts: 4179
    22
    IntelliJ IDE Python Java
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Here is an example that appears to work. It uses real simple RegEx and other non-regex steps to get at the result.


    The key is to provide a strong set of tests to ensure you cover all your needs. In my case, you would provide more strings in findIn to provide snippets of email that might break the algorithm, and searchs to provide patterns users might search for. My philosophy is to keep the code as readable as possible, so I tend to try to make the RegEx small and simple because I have a hard time reading them.
     
    Adrien Ruffie
    Ranch Hand
    Posts: 93
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Ok sorry for the problem

    A very good classe example, it work very well, but just a question why I can replace %%token%% by $$Token$$ ?
    It give me an array bound exception, '$' charater is special in regex ?

     
    Steve Luke
    Bartender
    Posts: 4179
    22
    IntelliJ IDE Python Java
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Adrien Ruffie wrote:Ok sorry for the problem

    A very good classe example, it work very well, but just a question why I can replace %%token%% by $$Token$$ ?
    It give me an array bound exception, '$' charater is special in regex ?


    See the API for java.util.regex.Pattern for a list of the RegEx syntax, which will include all the special characters, and the workaround if you want to use them. You could do $$Token$$ but you would have to escape the $: \\$\\$Token\\$\\$ (the same way I escaped the brackets in the patterns). Personally I would avoid using the special characters if I could, and if I couldn't, then search for all special characters and replace them with escaped versions.
     
    Adrien Ruffie
    Ranch Hand
    Posts: 93
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    ok this is what I thought.

    I a good solution for me very thank all :-)
     
    This tiny ad is wafer thin:
    Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
    https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
      Bookmark Topic Watch Topic
    • New Topic