• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
Bartenders:
  • Piet Souris
  • Himai Minh

Find valid IP program (Regex) from G&S - Please explain Regex

 
Ranch Hand
Posts: 201
1
Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Aloha Java Ranch,

I am reading the interesting and insightful book on passing the OCPJP Programmer 2 exam by G&S and had a question about the syntax of a huge regular expression. This code is from p. 217 of the G&S book.



My question pertains to the regex String reference that contains the mammoth regular expression. Could someone please help me understand this some more? This is a really useful program but am pretty confused about what exactly is going on.

Thank-you for reading.

Regards,

Ted
 
author
Posts: 23919
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ted North wrote:
My question pertains to the regex String reference that contains the mammoth regular expression. Could someone please help me understand this some more? This is a really useful program but am pretty confused about what exactly is going on.



For large regexes, it is generally a good idea to break it down to it's components. For example, take this component...



From the {3} portion, you know that you want three of these...



and from the alternation, you know that it is one of three different possibilities, that is then followed by a dot... the three possibilities are...



And at this point, the regexes are probably small enough for you to figure out.

Henry
 
Ted North
Ranch Hand
Posts: 201
1
Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Henry,

Thank-you for the detailed response.

So with the OR symbols (|) is the expression saying find a number like 250-255 OR 20-24 OR 0 or 1?

Regards,

Ted
 
Ted North
Ranch Hand
Posts: 201
1
Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Feel free to draw on this image I created of the regular expression (regex) to help explain the individual components and then re-upload to imgur to explain

 
Henry Wong
author
Posts: 23919
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ted North wrote:
So with the OR symbols (|) is the expression saying find a number like 250-255 OR 20-24 OR 0 or 1?



You forgot about all those "\\d" components in the sub-regexes.

Henry
 
Ted North
Ranch Hand
Posts: 201
1
Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is what I have so far. Still I am surprisingly confused about what is going on in this regular expression.




 
Henry Wong
author
Posts: 23919
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ted North wrote:This is what I have so far. Still I am surprisingly confused about what is going on in this regular expression.



You are not seeing how it comes together... for example...

2[0-4] --> means a string from "20" to "24"
\\d --> a single digit string


but... what happens when you put those two together? When you start seeing what happens, it will start to make sense.

Henry

 
Ranch Hand
Posts: 411
5
IntelliJ IDE Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:For large regexes, it is generally a good idea to break it down to it's components.



That is exactly what you must do to make sense of long regex's...

We start with "\\b((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)(\\.)){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\b" and move left to right just like the regex engine would:

Seeing that you have a fair amount of knowledge on regex's from your last post (the image), skipping over the obvious parts we decipher through the non-obvious

25[0-5] translates to 250 min to 255 max

2[0-4]\\d translates to 200 min to 249 max

[01]?\\d\\d? translates to 0 min to 199 max (remember ? mean 0 or one)
 
Ted North
Ranch Hand
Posts: 201
1
Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rico Felix wrote:

Henry Wong wrote:For large regexes, it is generally a good idea to break it down to it's components.



That is exactly what you must do to make sense of long regex's...

We start with "\\b((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)(\\.)){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\b" and move left to right just like the regex engine would:

Seeing that you have a fair amount of knowledge on regex's from your last post (the image), skipping over the obvious parts we decipher through the non-obvious

25[0-5] translates to 250 min to 255 max

2[0-4]\\d translates to 200 min to 249 max

[01]?\\d\\d? translates to 0 min to 199 max (remember ? mean 0 or one)



Thank you for the explanation of the numbers Rico. This really helped. I see how the \\d's represent any number from zero through nine now.

What is the operator that determines if three numbers appear, two numbers, or just one since an IP could be 192.168.300.300 or 192.168.1.1 etc?

Respectfully,

Ted
 
Rico Felix
Ranch Hand
Posts: 411
5
IntelliJ IDE Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ted North wrote:What is the operator that determines if three numbers appear, two numbers, or just one since an IP could be 192.168.300.300 or 192.168.1.1 etc?



[01]?\\d\\d? is used to get one digit or two digits or three digits since two digits are optional...

 
Ted North
Ranch Hand
Posts: 201
1
Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rico Felix wrote:

Ted North wrote:What is the operator that determines if three numbers appear, two numbers, or just one since an IP could be 192.168.300.300 or 192.168.1.1 etc?



[01]?\\d\\d? is used to get one digit or two digits or three digits since two digits are optional...



Rico,

Thank-you for explaining this confusing regular expression stuff.

So with the ORs | - the expression can choose only one of these? So in this case it would either be a number starting with 25 or 2 or 0 or 1?

Also, when would the regex engine choose the zero in [01] towards the end of the regex? I have never seen an IP such as 192.168.1.01

Sincerely,

Ted
 
Rico Felix
Ranch Hand
Posts: 411
5
IntelliJ IDE Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ted North wrote:So with the ORs | - the expression can choose only one of these? So in this case it would either be a number starting with 25 or 2 or 0 or 1?



That is exactly so...

Ted North wrote:Also, when would the regex engine choose the zero in [01] towards the end of the regex? I have never seen an IP such as 192.168.1.01



You must keep in mind that [01] specifies a character set meaning that the character at that position can either be 0 or 1 and not both 01... its a set of characters where one can be chosen from the set
 
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ted North wrote:Also, when would the regex engine choose the zero in [01] towards the end of the regex? I have never seen an IP such as 192.168.1.01


The regex doesn't choose a zero or one, it just allows it. IP addresses can be 4 groups of up to threee digits, ranging from 0 to 255, and with optional leading zeros. So 192.168.1.01 is a valid IP, as is 192.168.001.01.

So the third or statement comes down to: [01]?\\d\\d?: The middle \\d means the group must have at least one digit, and it can be anything from 0 to 9. The second (and optional) \\d allows the group to contain any combination of 2 digits 0-9. The optional zero or one at the start of the group says that the IP address can contain three digits, the second two can be any combination of 0 through 9 as long as the first digit is 0, 1, or doesn't exist. So this group covers all bases of 0 to 199 with or without leading zeros.

The other OR conditions take the special cases for how to handle numbers from 200 to 255, since you are only allowed a specific range of values in the second digit if the first digit is a 2 (values 20n to 25n). And if the first digit is a 2 and the second digit is a 5 then only a range of values are allowed in the third digit (250 to 255).
 
Henry Wong
author
Posts: 23919
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rico Felix wrote:

Ted North wrote:What is the operator that determines if three numbers appear, two numbers, or just one since an IP could be 192.168.300.300 or 192.168.1.1 etc?



[01]?\\d\\d? is used to get one digit or two digits or three digits since two digits are optional...



To drill this in more detail...

--> assuming that both optional components are not used, then it boils down to --> \\d --> which is a string from "0" to "9"

--> assuming that only the first optional component is used, then it boils down to --> [01]\\d --> which is a string from "00" to "19"

--> assuming that only the second optional component is used, then it boils down to --> \\d\\d --> which is a string from "00" to "99"

--> assuming that both the optional components are used, then it boils down to --> [01]\\d\\d --> which is a string from "000" to "199"


So, this sub-regex will either be a one, two, or three character string, that can parse to a number between zero and 199. And it will take care of all combinations of strings, meaning any 1 to 3 letter strings that yield that range (of 0 to 199), so you can effectively say that it is any number from 0 to 199.

So the only way that it can fail is when the string is zero padded to more than three characters, such as 00190.

Henry
 
Ted North
Ranch Hand
Posts: 201
1
Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Java Ranch for helping me understand regular expressions more! I read everyone's responses and it has helped me understand in much greater detail what is happening in the IP regular expression.

I am understanding how the individual components function together as a whole now. This may not be an incredibly difficult programming syntax to decipher after all.

I see how if there is a \\d meta-symbol after the brackets this symbolizes that there should be any digit (0-9) unless said meta-symbol is followed by a question mark which means that this is optional or can appear at the most a single time. Plus the OR symbols are not as confusing as before. I do not think I understood at first that only one of these options is being chosen at a time.

Thank-you again Java Ranch. I am sure I will have more questions for the board soon.

Sincerely,

Ted
 
What do you have in that there bucket? It wouldn't be a tiny ad by any chance ...
Free, earth friendly heat - from the CodeRanch trailboss
https://www.kickstarter.com/projects/paulwheaton/free-heat
reply
    Bookmark Topic Watch Topic
  • New Topic