wood burning stoves 2.0*
The moose likes Beginning Java and the fly likes regex date capture - greed, reluctance, and precedence problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "regex date capture - greed, reluctance, and precedence problem" Watch "regex date capture - greed, reluctance, and precedence problem" New topic
Author

regex date capture - greed, reluctance, and precedence problem

Chris Treglio
Ranch Hand

Joined: Jun 18, 2001
Posts: 64

I've got a long list of strings with dates in the format dd-mmm-yyyy which I'm trying to capture. I'd like to be able to handle missing leading zeroes in the day part (i.e. properly capture 01-Jan-2011 and 1-Jan-2011).

My current code doesn't handle leading zeroes.



I thought by changing the day part to ".*((?:[12][0-9]|3[01]|0?[1-9])-", I would make the leading zero "greedy" optional, and capture it if it's there. It does not. And furthermore, it turns dates like "12-Mar-2011" into "2-Mar-2011". Obviously, I'd want matches in the teens, twenties, or thirties to get captured too.

what am I doing?
Wouter Oet
Saloon Keeper

Joined: Oct 25, 2008
Posts: 2700

You're trying to reinvent the wheel. Why use regex to parse dates when you can use DateFormat/SimpleDateFormat?


"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
Chris Treglio
Ranch Hand

Joined: Jun 18, 2001
Posts: 64
I'm not really trying to turn a String into a Date object, I'm trying to pull out String dates from a longer String filled with other stuff. My String sources are like "bla bla blah blah 02-Apr-2011 blah bla blah".

Can you do that with the DateFormat/SimpleDateFormat?
Wouter Oet
Saloon Keeper

Joined: Oct 25, 2008
Posts: 2700

Aha. I'm not sure if that is possible.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
I'm guessing the issue is that the initial .* is greedy, and that wins out over your other intentions here. It will probably overlook a leading 1, 2, or 3 as well, not just leading 0. As long as there's at least one digit after, to match the rest of the expression.

I suggest either:

(a) replace .* with .*?, which is reluctant

or

(b) drop the .* entirely, and replace matches() with find().
 
 
subject: regex date capture - greed, reluctance, and precedence problem