# Split using regex doubt

megha joshi

Ranch Hand

Posts: 206

posted 9 years ago

Hi the output of the following code fragments have puzzled me....Anyone please shed some light..

1) String str = " apples";

String s[] = str.split("\\w*");

for (String i:s)

System.out.println("Token" + i + "Token");

Output is :

TokenToken

Token Token

2) String str = "apples";

String s[] = str.split("\\w*");

for (String i:s)

System.out.println("Token" + i + "Token");

No Output

3) String str = "apples ";

String s[] = str.split("\\w*");

for (String i:s)

System.out.println("Token" + i + "Token");

Output is :

TokenToken

TokenToken

Token Token

I have surrounded the output by word Token so as to distinguish between space and null. But I dont get the logic behind this...Also, whoever knows how this works ...can they please guide me to some good tutorial on the above or instead just tell me that I dont need to worry about the above for the exam

1) String str = " apples";

String s[] = str.split("\\w*");

for (String i:s)

System.out.println("Token" + i + "Token");

Output is :

TokenToken

Token Token

2) String str = "apples";

String s[] = str.split("\\w*");

for (String i:s)

System.out.println("Token" + i + "Token");

No Output

3) String str = "apples ";

String s[] = str.split("\\w*");

for (String i:s)

System.out.println("Token" + i + "Token");

Output is :

TokenToken

TokenToken

Token Token

I have surrounded the output by word Token so as to distinguish between space and null. But I dont get the logic behind this...Also, whoever knows how this works ...can they please guide me to some good tutorial on the above or instead just tell me that I dont need to worry about the above for the exam

posted 9 years ago

Okay, basically, you have three things going on here...

1. The regex as written, is greedy, so it will always match the whole "apples", when it encounters it.

2. The split always go from left to right as the starting point. This means that it can't match "apples" until the start is at the "a". Furthermore, the way this regex is written, it is capable of matching nothing (zero length match).

3. The default split, that doesn't limit the number of matches, always delete any trailing zero length matches.

So...

For the first case:

The first split match is a zero length match at index zero. The second split match is "apples". And the third split match is a zero length match at the end of apples. This create a first value of zero length, a second value of a single space, a third value of zero length, and a fourth value of zero length. However, applying rule #3, the third and fourth value are deleted.

For the second case:

The first split match is apples. And the second split match is zero length right after apples. This creates a first value of zero length, a second value of zero length, and a third value of zero length. However, applying rule #3, all three values are deleted.

For the third case:

The first split match is apples. The second split match is the zero length right after apples. And the third split match is the zero length right after the space. This creates a first value of zero length, a second value of zero length, a third value of a single space, and a fourth value of zero length. However, applying rule #3, the fourth value is deleted.

[EDIT: Corrected First and Second Case -- sorry]

Henry

[ March 25, 2007: Message edited by: Henry Wong ]

1. The regex as written, is greedy, so it will always match the whole "apples", when it encounters it.

2. The split always go from left to right as the starting point. This means that it can't match "apples" until the start is at the "a". Furthermore, the way this regex is written, it is capable of matching nothing (zero length match).

3. The default split, that doesn't limit the number of matches, always delete any trailing zero length matches.

So...

For the first case:

The first split match is a zero length match at index zero. The second split match is "apples". And the third split match is a zero length match at the end of apples. This create a first value of zero length, a second value of a single space, a third value of zero length, and a fourth value of zero length. However, applying rule #3, the third and fourth value are deleted.

For the second case:

The first split match is apples. And the second split match is zero length right after apples. This creates a first value of zero length, a second value of zero length, and a third value of zero length. However, applying rule #3, all three values are deleted.

For the third case:

The first split match is apples. The second split match is the zero length right after apples. And the third split match is the zero length right after the space. This creates a first value of zero length, a second value of zero length, a third value of a single space, and a fourth value of zero length. However, applying rule #3, the fourth value is deleted.

[EDIT: Corrected First and Second Case -- sorry]

Henry

[ March 25, 2007: Message edited by: Henry Wong ]

swarna dasa

Ranch Hand

Posts: 108

posted 9 years ago

Man!!! This did confuse me as well...

http://java.sun.com/docs/books/tutorial/essential/regex/quant.html

(read "Differences Among Greedy, Reluctant, and Possessive Quantifiers")

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from.

You can read the whole tutorial at http://java.sun.com/docs/books/tutorial/essential/regex/index.html

http://java.sun.com/docs/books/tutorial/essential/regex/quant.html

(read "Differences Among Greedy, Reluctant, and Possessive Quantifiers")

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from.

You can read the whole tutorial at http://java.sun.com/docs/books/tutorial/essential/regex/index.html

megha joshi

Ranch Hand

Posts: 206

posted 9 years ago

Thanks for the reply and the tutorial.

I am sorry but I dont understand how the zero length comes in the front before apples in the second and third case and not before apples in the first case in the logic with the following...Its a bit confusing for me.

Can you please explain.

-------------------------------------------------------------------------

For the first case:

The first split match is a zero length match at index zero. The second split match is "apples". And the third split match is a zero length match at the end of apples. This create a first value of zero length, a second value of a single space, a third value of zero length, and a fourth value of zero length. However, applying rule #3, the third and fourth value are deleted.

For the second case:

The first split match is apples. And the second split match is zero length right after apples. This creates a first value of zero length, a second value of zero length, and a third value of zero length. However, applying rule #3, all three values are deleted.

For the third case:

The first split match is apples. The second split match is the zero length right after apples. And the third split match is the zero length right after the space. This creates a first value of zero length, a second value of zero length, a third value of a single space, and a fourth value of zero length. However, applying rule #3, the fourth value is deleted.

---------------------------------------------------------------------- :roll:

I am sorry but I dont understand how the zero length comes in the front before apples in the second and third case and not before apples in the first case in the logic with the following...Its a bit confusing for me.

Can you please explain.

-------------------------------------------------------------------------

For the first case:

The first split match is a zero length match at index zero. The second split match is "apples". And the third split match is a zero length match at the end of apples. This create a first value of zero length, a second value of a single space, a third value of zero length, and a fourth value of zero length. However, applying rule #3, the third and fourth value are deleted.

For the second case:

The first split match is apples. And the second split match is zero length right after apples. This creates a first value of zero length, a second value of zero length, and a third value of zero length. However, applying rule #3, all three values are deleted.

For the third case:

The first split match is apples. The second split match is the zero length right after apples. And the third split match is the zero length right after the space. This creates a first value of zero length, a second value of zero length, a third value of a single space, and a fourth value of zero length. However, applying rule #3, the fourth value is deleted.

---------------------------------------------------------------------- :roll: