aspose file tools*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Split using regex doubt Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Split using regex doubt" Watch "Split using regex doubt" New topic
Author

Split using regex doubt

megha joshi
Ranch Hand

Joined: Feb 20, 2007
Posts: 206
Hi the output of the following code fragments have puzzled me....Anyone please shed some light..

1) String str = " apples";
String s[] = str.split("\\w*");
for (String i:s)
System.out.println("Token" + i + "Token");

Output is :
TokenToken
Token Token

2) String str = "apples";
String s[] = str.split("\\w*");
for (String i:s)
System.out.println("Token" + i + "Token");
No Output

3) String str = "apples ";
String s[] = str.split("\\w*");
for (String i:s)
System.out.println("Token" + i + "Token");
Output is :
TokenToken
TokenToken
Token Token

I have surrounded the output by word Token so as to distinguish between space and null. But I dont get the logic behind this...Also, whoever knows how this works ...can they please guide me to some good tutorial on the above or instead just tell me that I dont need to worry about the above for the exam
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19066
    
  40

Okay, basically, you have three things going on here...

1. The regex as written, is greedy, so it will always match the whole "apples", when it encounters it.
2. The split always go from left to right as the starting point. This means that it can't match "apples" until the start is at the "a". Furthermore, the way this regex is written, it is capable of matching nothing (zero length match).
3. The default split, that doesn't limit the number of matches, always delete any trailing zero length matches.

So...

For the first case:

The first split match is a zero length match at index zero. The second split match is "apples". And the third split match is a zero length match at the end of apples. This create a first value of zero length, a second value of a single space, a third value of zero length, and a fourth value of zero length. However, applying rule #3, the third and fourth value are deleted.

For the second case:

The first split match is apples. And the second split match is zero length right after apples. This creates a first value of zero length, a second value of zero length, and a third value of zero length. However, applying rule #3, all three values are deleted.

For the third case:

The first split match is apples. The second split match is the zero length right after apples. And the third split match is the zero length right after the space. This creates a first value of zero length, a second value of zero length, a third value of a single space, and a fourth value of zero length. However, applying rule #3, the fourth value is deleted.

[EDIT: Corrected First and Second Case -- sorry]

Henry
[ March 25, 2007: Message edited by: Henry Wong ]

Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
swarna dasa
Ranch Hand

Joined: Mar 15, 2007
Posts: 108
Man!!! This did confuse me as well...

http://java.sun.com/docs/books/tutorial/essential/regex/quant.html
(read "Differences Among Greedy, Reluctant, and Possessive Quantifiers")

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from.


You can read the whole tutorial at http://java.sun.com/docs/books/tutorial/essential/regex/index.html
megha joshi
Ranch Hand

Joined: Feb 20, 2007
Posts: 206
Thanks for the reply and the tutorial.
I am sorry but I dont understand how the zero length comes in the front before apples in the second and third case and not before apples in the first case in the logic with the following...Its a bit confusing for me.
Can you please explain.
-------------------------------------------------------------------------
For the first case:

The first split match is a zero length match at index zero. The second split match is "apples". And the third split match is a zero length match at the end of apples. This create a first value of zero length, a second value of a single space, a third value of zero length, and a fourth value of zero length. However, applying rule #3, the third and fourth value are deleted.

For the second case:

The first split match is apples. And the second split match is zero length right after apples. This creates a first value of zero length, a second value of zero length, and a third value of zero length. However, applying rule #3, all three values are deleted.

For the third case:

The first split match is apples. The second split match is the zero length right after apples. And the third split match is the zero length right after the space. This creates a first value of zero length, a second value of zero length, a third value of a single space, and a fourth value of zero length. However, applying rule #3, the fourth value is deleted.
---------------------------------------------------------------------- :roll:
Andrew Ebling
Greenhorn

Joined: May 18, 2006
Posts: 23
Remember that the regex you supply to String.split() is for matching delimiters not tokens. This confused me for a while too and I don't think the JavaDocs make it clear until you get to the examples.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Split using regex doubt