This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
That's the end result, but what really happens is a little more complex. The first dot-star (".*") originally consumes all the (non-line-separator) characters in the target. Then the regex engine starts backing off, trying to match the literal sequence "date". Once it finds that, the second dot-star gobbles up the remaining the characters.
It's important to understand about backtracking because, when no match is possible, the regex engine has to try every possibility before giving up. If there are a lot of dot-stars or other indeterminate components in the regex, that can easily add up to millions or billions of possibilities--and an effectively hung application.
You know what they say: with great power comes great screwupability. Or something like that.
Joined: Nov 16, 2004
I'm hardly a master of Reg exps (just learned learned the usefulness and difference of lazy vs greedy matching). But yeah ^ means beginning of the line and $ is the end of a line. and .* is any character any number of times.
The regular expression ^.*date.*$ equates to "go to the beginning of the string match any character zero or more times until you reach the string 'date' then match any character zero or more times until the end of the string is reached"
If there are a lot of dot-stars or other indeterminate components in the regex, that can easily add up to millions or billions of possibilities--and an effectively hung application
I've never witnessed that using a regexp, but I don't doubt the possibility is there a way to specify find the first match and then quit? Something I've always had issues with in doing regexps is normally I want to search an entire file and the line breaks screw it up. I've read through the sun api and it mentions setting multi line matching, but I must be missing something b/c it still doesn't work as expected. It would be conveninant if the ^ beginning of file and the $ matched EOF and then the typical /r/n or /n matched line breaks etc. To get around this I've taken to reading a file line by line and appending them into a Stringbuffer, but I can't help but wonder what is going to happen if I run it against a document with 1000 or more pages. Anyone do anything similiar to this before and have any tips? Java out of heap space errors suck.
Joined: May 06, 2004
Normally, the '.' metacharacter matches any character except a line separator, but you can make it match those as well by compiling the regex with the DOTALL flag set. The MULTILINE flag causes the '^' and '$' anchors to match at the beginning and end of logical lines as well as the beginning and end of the whole input.
Originally posted by Stefan Wagner: [QB] should be enough.
I don't think this one works, because .* would consume the whole line, and it will not get an opportunity to match the word "date" in the input. try it, it will only return the whole text.
Instead I would suggest that the you zoom to the first d, and then see if its a "date" or something else.
which simply means, consume all the characters before you encounter the d and then see if it is a date, if not, move on to the next d and so on...
One more thing, the .* at the end of the regex is unnecssary, I don't see any reason why we need something more than date.... if it matches date, then it will also match dateblahblahblah, makes no sense when all you need is "date"
"It's not enough that we do our best; sometimes we have to do<br />what's required."<br /> <br />-- Sir Winston Churchill
Joined: May 06, 2004
Akshay, Stefan's suggestion does work, and both the dot-stars (".*") are necessary. This is because we're using the matches() method to perform the test. Most regex tools define "a match" to mean that the regex describes any substring of the target string, but matches() returns true only if the regex describes the whole string. So, although we're only interested in the substring "date", we have to use the dot-stars to gobble up all the text before and after it. (There's also a find() method that looks for "a match" in the traditional sense, but you'll only find it in the class java.util.regex.Matcher.)
As for why the regex works despite the greediness of the first dot-star, see my first reply above.
Tad, could you be more specific about your problems matching within files? What exactly are you trying to do?
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com