Piet Verdriet

Ranch Hand
+ Follow
since Feb 25, 2006
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Piet Verdriet

Jane Dodo wrote:

Piet Verdriet wrote:
! @OP: I'd advice against your solution and have a look at my earlier suggestion (the one with the \G in it).



Does not work for me, as I am only to replace leading and trailing (decimal point) zeroes, i.e. 00012003.40 should become 12003.4.
BTW, if there is a formatter that does that (i.e. in the prcess of cconverting a double to a String) that would be even cooler.



Sorry Jane, I forgot you were the OP.

I don't follow you exactly, you wanted to replace "000400003.300" with "&&&400003.3&&" where the '&'-s are white spaces, right? If so, then my earlier suggestion does exactly that:



AFAIK, there is no Formatter in Java that does exactly as the above.
14 years ago

Jane Dodo wrote:
An interesting article. Any idea WHY Java does it that way? I mean, why not just count bytes as it goes and throw an exception in an unlikely event when the matched string (or whatever) is longer than Integer.MAX_VALUE characters? But then, I am a regex newbie.



Good point. I was wondering the same. It seems this is an accepted bug*. When a * or + is used inside a character class, an exception should be thrown! @OP: I'd advice against your solution and have a look at my earlier suggestion (the one with the \G in it).

http://bugs.sun.com/view_bug.do?bug_id=6695369
14 years ago

Henry Wong wrote:

Piet Verdriet wrote:For an explanation of the * and + being sometimes valid and sometimes invalid inside a look-behind, see: http://stackoverflow.com/questions/1536915/regex-look-behind-without-obvious-maximum-length-in-java



That's actually a very enlightling article, Piet. Thanks...

Henry



I thought you would.
You're welcome of course!
14 years ago
For an explanation of the * and + being sometimes valid and sometimes invalid inside a look-behind, see: http://stackoverflow.com/questions/1536915/regex-look-behind-without-obvious-maximum-length-in-java
14 years ago
By default, the DOT does not match new line characters. So, if you option-tags spans multiple lines, nothing is found. Either enable DOT-ALL matching, or do something like this:

14 years ago

Henry Wong wrote:

Piet Verdriet wrote:
Interesting, and thank you for following up: I'm going to see if I can find out if perhaps some things have changed lately.



I don't know if it "changed lately", or was always like this... but I thought the infinite repetition restriction in look-aheads and look-behinds only applied to the split() method.... meaning... I always remember being able to use * and + in look-aheads and look-behinds, since regex was introduced in Java 1.4 (as long as they are not used in the split() method).



It doesn't matter what what method you use, matches(), replaceAll() and split() all produce the same output. But it gets a bit strange (in my opinion). See the test below:



When you run this test, you'll see that 1, 2 and 3 run without a hitch, yet 4, 5 and 6 produce exceptions...

Henry Wong wrote:

Piet Verdriet wrote:And about point 2: I truly thought that the regex engine performed it's replacements from left to right and that these replacements influenced the characters to the right of it.



Never thought about this... But it does make sense that it will work though. Strings are immutable. And under the covers, the replaceAll() uses the appendReplacement() and appendTail() methods, which uses a separate string buffer to create the result string.

Henry



That sounds reasonable.
14 years ago

Henry Wong wrote:

Piet Verdriet wrote:
Interesting, and thank you for following up: I'm going to see if I can find out if perhaps some things have changed lately.



I don't know if it "changed lately", or was always like this... but I thought the infinite repetition restriction in look-aheads and look-behinds only applied to the split() method.... meaning... I always remember being able to use * and + in look-aheads and look-behinds, since regex was introduced in Java 1.4 (as long as they are not used in the split() method).

Piet Verdriet wrote:And about point 2: I truly thought that the regex engine performed it's replacements from left to right and that these replacements influenced the characters to the right of it.



Never thought about this... But it does make sense that it will work though. Strings are immutable. And under the covers, the replaceAll() uses the appendReplacement() and appendTail() methods, which uses a separate string buffer to create the result string.

Henry



Not sure about the split(...), I'll look into that.
About look-aheads: AFAIK, that has always worked with both + and *, it was only the look-behinds that were restricted in Java (and many other languages for that matter).
14 years ago

Jane Dodo wrote:


That does not work for two reasons:
1 - Java does not support look behinds with infinite repetition (so no '*' and '+' inside look behinds!);
2 - If the infinite look behinds DID work, the first 0 in your string would have been replaced, but the second zero would not be replaced because it would not have a 0 at it's left (because you justed replaced it!).



Don't know, seems to work fine for me!



Well I'll be damned, it does!

Here's what regex-advice.info says: "Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java recognizes the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths.".
-- http://www.regular-expressions.info/lookaround.html

And about point 2: I truly thought that the regex engine performed it's replacements from left to right and that these replacements influenced the characters to the right of it.

Interesting, and thank you for following up: I'm going to see if I can find out if perhaps some things have changed lately.

Regards,

Piet.
14 years ago

Jane Dodo wrote:You are right. My mistake was using a lookahead instead of a lookbehind.

Here is the code that does what I need:



That does not work for two reasons:
1 - Java does not support look behinds with infinite repetition (so no '*' and '+' inside look behinds!);
2 - If the infinite look behinds DID work, the first 0 in your string would have been replaced, but the second zero would not be replaced because it would not have a 0 at it's left (because you justed replaced it!).
14 years ago

Jane Dodo wrote:What should this expression evaluate to?

"000hello".replaceAll("^(?=0)0", "&");

Java 1.6 evaluates it to "&00hello", which I think is incorrect. Just need a second pair of eyes!



Not surprisingly, Java is correct. The ^ meta character means "the start of the string". Now, there is only one zero at the start of the string, so only one is replaced.

Jane Dodo wrote:While I am at it... given a String "00043.30", what's the easiest way to format it to "&&&43.3& "? (It's really going to be spaces, but this forum filters them away). Using any means available in Java, such as regex, DecimalFormat, etc. The most straightforward way (with character tweaking) does not look very elegant.



So you want to replace all leading and trailing zeros? Here's a regex solution:



As you see, the middle zero's are not replaced. But that will most probably look like voodoo to you. Better do it manually.
14 years ago

Mahesh Mak wrote:Hi,

I have a multi line status xml out of which i need to extract the status value.
<status>ok</status>
<status>Failed</status>

i cannot parse it as a xml ...



That seems a contradiction: you have XML, but you can't parse it as XML...
Perhaps you meant: "I don't want to parse it as XML". Out of curiosity: why not?
14 years ago
Try:

14 years ago

Henry Wong wrote:Hmmm.... How about this for checking odd number of backslashes?



Unfortunately, this can only check for odd number of backslashes up til 2001 backslashes, then it breaks...


To Joe Carco, don't you wish you wrote your own "boring" regex instead, now? ...

Henry>





You could always replace 1000 with Integer.MAX_VALUE
14 years ago

Rob Prime wrote:Which I already mentioned:

Rob Prime wrote:In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.



Indeed, missed your response!
14 years ago