GeeCON Prague 2014*
The moose likes Java in General and the fly likes nothing regular about regular expressions Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "nothing regular about regular expressions" Watch "nothing regular about regular expressions" New topic
Author

nothing regular about regular expressions

J. Kevin Robbins
Bartender

Joined: Dec 16, 2010
Posts: 983
    
  13

For the first time ever, I have a need to used regular expressions in a program. I've never even used them with grep. So, I've been looking at several tutorial pages and it's absolutely mind-boggling.

My needs are simple enough. I need to scan several hundred documents looking for document numbers which will all have the format XXX-YYY-ZZZ-123, that is, three alphas, a dash, three alphas, a dash, three alphas, a dash, and three numbers.

I can't even begin to figure out how to compose a regex to match this pattern. To test I started with a base string of ABC-DEF. I tried using "[A-Z]\-[A-Z]" just to get a feel for things, but the tester told me that matches "C-D". Huh? Why didn't it match the entire string? Same result with "[A-Z]-[A-Z]".

Can anyone point me to a tutorial or book that's simple enough for even me to understand?


"The good news about computers is that they do what you tell them to do. The bad news is that they do what you tell them to do." -- Ted Nelson
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61315
    
  66

I first learned regular expressions using our own Max Habibi's book. I highly recommend it.

Hint: your regex isn't accounting for the fact that you want three consecutive characters.


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Emanuel Kadziela
Ranch Hand

Joined: Mar 24, 2005
Posts: 186
This is a good place to start: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
J. Kevin Robbins
Bartender

Joined: Dec 16, 2010
Posts: 983
    
  13

Bear Bibeault wrote:I first learned regular expressions using our own Max Habibi's book. I highly recommend it.

Hint: your regex isn't accounting for the fact that you want three consecutive characters.


Thanks for the tip. It's on it's way. Incidentally, am I the only one who can never order one book? Every time I order one I think, "what else is on my wish list that I could bundle with this?". So now I not only have this book on the way, but also The Pragmatic Programmer, and JavaScript the Good Parts.

I'll look at the hint on consecutive characters. I just installed Expresso, a regex generator. Maybe that will at least help me keep moving on this project.
J. Kevin Robbins
Bartender

Joined: Dec 16, 2010
Posts: 983
    
  13

Emanuel Kadziela wrote:This is a good place to start: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html


Greedy, reluctant, and possesive? <sigh>

Thanks for the link, but I'll be honest. I see lots of folks around here recommend the Oracle docs and tutorials, but I don't find them all that helpful. They often seem to assume a level of knowledge that I don't yet possess, and they rarely if ever contain useful examples. That link is typical. It lists greedy, reluctant, and possessive quantifiers but makes no attempt to explain what they are or how to use them.
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61315
    
  66

People often ask why books are still relevant when there are so many online tutorials available. This is one good example why. Books can take the time to explain concepts in proper order and with proper exposition and motivation.

Long live books!
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

Jk Robbins wrote:My needs are simple enough. I need to scan several hundred documents looking for document numbers which will all have the format XXX-YYY-ZZZ-123, that is, three alphas, a dash, three alphas, a dash, three alphas, a dash, and three numbers.

I can't even begin to figure out how to compose a regex to match this pattern. To test I started with a base string of ABC-DEF. I tried using "[A-Z]\-[A-Z]" just to get a feel for things, but the tester told me that matches "C-D". Huh? Why didn't it match the entire string? Same result with "[A-Z]-[A-Z]".


Basically, "[A-Z]" will match a single capital letter. To match three capital letters in a row, you can do this ... "[A-Z][A-Z][A-Z]" ... or even this ... "[A-Z]{3}".

Hope this helps,
Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11356
    
  16

I am a fan of a tool called Regex Coach. It's probably mis-named, as it doesn't actually teach you anything via a formal lesson...

But it does let you put in various target strings and a regex and you can see how to build things up to match what you want.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
J. Kevin Robbins
Bartender

Joined: Dec 16, 2010
Posts: 983
    
  13

Thanks, Henry. With the help of Expresso (the app, not the drink) I finally stumbled my way to "[A-Z]{3}-[A-Z]{3}-[A-Z]{3}-\\d{3}". Further testing will determine if that's a home run.
Tony Docherty
Bartender

Joined: Aug 07, 2007
Posts: 2302
    
  49
Whilst you are waiting for the book to arrive you can check out this comprehensive tutorial http://www.regular-expressions.info/tutorial.html

And if you want to test out your regex against an input string to see what values it is actually matching, a number of years ago I wrote a basic applet which is available here: http://www.keang.co.uk/regex.html
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19697
    
  20

Jk Robbins wrote:Thanks, Henry. With the help of Expresso (the app, not the drink) I finally stumbled my way to "[A-Z]{3}-[A-Z]{3}-[A-Z]{3}-\\d{3}". Further testing will determine if that's a home run.

That regex looks good, but you can shorten it: (?:[A-Z]{3}-){3}\\d{3}. Don't be put off by the ?:, it just means that the part inside the () doesn't need to be remembered as a capturing group.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
Bear Bibeault wrote:People often ask why books are still relevant when there are so many online tutorials available. This is one good example why.

To be fair, the link Emanuel is not to a tutorial of any kind. It's to the JavaDoc. For a tutorial, the "canonical" place to start now is probably the Java Tutorial on Regular Expressions. I have no idea how good it is; it didn't exist when I learned regex. But that's probably a better starting point for the subject, and a fairer comparison with books.
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61315
    
  66

While true, I still think that online tutorials are OK for spot knowledge -- but books are still the bomb for explaining things in depth and in a logical fashion. At least the well-written ones.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Psychologically, though, when you're working on a problem and you say "I know! I'll use a regular expression!" you don't really feel like spending three days with a book to get an in-depth grounding in the subject. You really feel like whipping through an online tutorial and learning just enough to get a working regex.

I know, you're going to tell me that this attitude explains the quality of much of the code which is out there in the real world. And I wouldn't disagree. I'm just saying, it's hard to avoid having that attitude.

I don't mean to point the finger at the OP, either. I know for sure that I have done the same sort of thing numerous times in my career.
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61315
    
  66

Right, but I think that there is a difference between hitting a tutorial as a refresher, or for examples of concepts that aren't brand spanking new.

For example, I've written a lot of online articles about JSP issues -- but no one who isn't familiar with JSP is going the learn JSP from them.
Tony Docherty
Bartender

Joined: Aug 07, 2007
Posts: 2302
    
  49
For learning about a subject give me a good book any day. As an aide memoir or to find a detailed bit of information on a specific section of the subject an on-line tutorial is often better.
J. Kevin Robbins
Bartender

Joined: Dec 16, 2010
Posts: 983
    
  13

Paul Clapham wrote:Psychologically, though, when you're working on a problem and you say "I know! I'll use a regular expression!" you don't really feel like spending three days with a book to get an in-depth grounding in the subject. You really feel like whipping through an online tutorial and learning just enough to get a working regex.

I know, you're going to tell me that this attitude explains the quality of much of the code which is out there in the real world. And I wouldn't disagree. I'm just saying, it's hard to avoid having that attitude.

I don't mean to point the finger at the OP, either. I know for sure that I have done the same sort of thing numerous times in my career.


You're exactly right; at this point I need to get it working and proceed with the project, so I want a quick solution. However, I try to take it to the next step. I realize that this is an area where I'm weak and I need to improve. So I'll work through the book when I don't have the pressure of a project hanging over me, and the next time I need to use regex, I won't be banging my head against the wall. I'll only need a quick refresher like the links provided above.

I agree there is a lot of what I call "copy and paste programming" out there, where someone gets some code from SO or other sources and they fiddle with it until it works, but in the end they don't really understand why it works. I refuse to get trapped into that mentality. I'm a "take-it-apart-and-see-how-it-works" kind of guy, whether it's software or a car engine. I have to understand the how and why and I'm not satisfied until I do.

Thanks to everyone for the information and links.
Comal Rajagopalaratnam Muthukumar
Ranch Hand

Joined: Mar 18, 2012
Posts: 89
Jk Robbins wrote:For the first time ever, I have a need to used regular expressions in a program. I've never even used them with grep. So, I've been looking at several tutorial pages and it's absolutely mind-boggling.

My needs are simple enough. I need to scan several hundred documents looking for document numbers which will all have the format XXX-YYY-ZZZ-123, that is, three alphas, a dash, three alphas, a dash, three alphas, a dash, and three numbers.

I can't even begin to figure out how to compose a regex to match this pattern. To test I started with a base string of ABC-DEF. I tried using "[A-Z]\-[A-Z]" just to get a feel for things, but the tester told me that matches "C-D". Huh? Why didn't it match the entire string? Same result with "[A-Z]-[A-Z]".

Can anyone point me to a tutorial or book that's simple enough for even me to understand?


hello
Have you referred the Book on
"JavaRegularExpression:Taming the Java.util.regex Engine" by MehranHabibi
if not please do so or try at the www.apress.com to fetch source code on line
Please reply if it is helpful
to you
Thanks
As
CRMK
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
Yes, that's the book Bear recommended in the second post above. And JK responded immediately after, saying the book has been ordered.
J. Kevin Robbins
Bartender

Joined: Dec 16, 2010
Posts: 983
    
  13

The book arrived and I'm about halfway through chapter 2. It's an excellent resource.

I can highly recommend this book for anyone else who is trying to get their head wrapped around regex.

 
GeeCON Prague 2014
 
subject: nothing regular about regular expressions