| Author |
Split using regex doubt
|
megha joshi
Ranch Hand
Joined: Feb 20, 2007
Posts: 206
|
|
Hi the output of the following code fragments have puzzled me....Anyone please shed some light.. 1) String str = " apples"; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); Output is : TokenToken Token Token 2) String str = "apples"; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); No Output 3) String str = "apples "; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); Output is : TokenToken TokenToken Token Token I have surrounded the output by word Token so as to distinguish between space and null. But I dont get the logic behind this...Also, whoever knows how this works ...can they please guide me to some good tutorial on the above or instead just tell me that I dont need to worry about the above for the exam
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16811
|
|
Okay, basically, you have three things going on here... 1. The regex as written, is greedy, so it will always match the whole "apples", when it encounters it. 2. The split always go from left to right as the starting point. This means that it can't match "apples" until the start is at the "a". Furthermore, the way this regex is written, it is capable of matching nothing (zero length match). 3. The default split, that doesn't limit the number of matches, always delete any trailing zero length matches. So... For the first case: The first split match is a zero length match at index zero. The second split match is "apples". And the third split match is a zero length match at the end of apples. This create a first value of zero length, a second value of a single space, a third value of zero length, and a fourth value of zero length. However, applying rule #3, the third and fourth value are deleted. For the second case: The first split match is apples. And the second split match is zero length right after apples. This creates a first value of zero length, a second value of zero length, and a third value of zero length. However, applying rule #3, all three values are deleted. For the third case: The first split match is apples. The second split match is the zero length right after apples. And the third split match is the zero length right after the space. This creates a first value of zero length, a second value of zero length, a third value of a single space, and a fourth value of zero length. However, applying rule #3, the fourth value is deleted. [EDIT: Corrected First and Second Case -- sorry] Henry [ March 25, 2007: Message edited by: Henry Wong ]
|
Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
|
 |
swarna dasa
Ranch Hand
Joined: Mar 15, 2007
Posts: 108
|
|
Man!!! This did confuse me as well... http://java.sun.com/docs/books/tutorial/essential/regex/quant.html (read "Differences Among Greedy, Reluctant, and Possessive Quantifiers") Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. You can read the whole tutorial at http://java.sun.com/docs/books/tutorial/essential/regex/index.html
|
 |
megha joshi
Ranch Hand
Joined: Feb 20, 2007
Posts: 206
|
|
Thanks for the reply and the tutorial. I am sorry but I dont understand how the zero length comes in the front before apples in the second and third case and not before apples in the first case in the logic with the following...Its a bit confusing for me. Can you please explain. ------------------------------------------------------------------------- For the first case: The first split match is a zero length match at index zero. The second split match is "apples". And the third split match is a zero length match at the end of apples. This create a first value of zero length, a second value of a single space, a third value of zero length, and a fourth value of zero length. However, applying rule #3, the third and fourth value are deleted. For the second case: The first split match is apples. And the second split match is zero length right after apples. This creates a first value of zero length, a second value of zero length, and a third value of zero length. However, applying rule #3, all three values are deleted. For the third case: The first split match is apples. The second split match is the zero length right after apples. And the third split match is the zero length right after the space. This creates a first value of zero length, a second value of zero length, a third value of a single space, and a fourth value of zero length. However, applying rule #3, the fourth value is deleted. ---------------------------------------------------------------------- :roll:
|
 |
Andrew Ebling
Greenhorn
Joined: May 18, 2006
Posts: 23
|
|
|
Remember that the regex you supply to String.split() is for matching delimiters not tokens. This confused me for a while too and I don't think the JavaDocs make it clear until you get to the examples.
|
 |
 |
|
|
subject: Split using regex doubt
|
|
|