i could not understand split method in string properly, i have looked thorough this forum , i got this example ,but i could not understand the explanation 1) String str = " apples"; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token");
Output is : TokenToken Token Token
2) String str = "apples"; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); No Output
3) String str = "apples "; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); Output is : TokenToken TokenToken Token Token
what i have understood is
for first case: space a p p l e s the split method contains space,space but the out put is not like that.
Please Explain me this
gianni ipez
Ranch Hand
Joined: Jan 02, 2007
Posts: 65
posted
0
I have no idea. I thought I understod the split method, but I didn't too. Ciao, Gianni
Chandra Bhatt
Ranch Hand
Joined: Feb 28, 2007
Posts: 1707
posted
0
Hi Anil,
Let us see one by one:
Output: >< //beginning blank string > < //space after that
Use of second argument of split() method:
Output: >< //begininning blank string > < //space after that >< //blank string after space
Let us modify the code to understand it much better:
Output: >< //beginning blank string > < //space after that >< //blank string after space >< //blank string after "s" in apples
Your next doubt:
Output: Nothing !!!
To get the concept quickly read the second point below!
1- Remember by default the second argument is 0 in the split method. 2- "*" is greedy 0 or more
Ok i got that but see here 3) String str = "apples "; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); Output is : TokenToken TokenToken Token Token
Here the o/p should be TokenToken//blank after apples Token Token//space TokenToken//blank But the o/p is not like that Why ?
Thanks Anil Kumar
Matt Russell
Ranch Hand
Joined: Aug 15, 2006
Posts: 165
posted
0
It's really worth having a browse of the source code of java.util.regex.Pattern to get a clear understanding of what's going on here (String.split() calls Pattern.split()). I'll try and explain what's going on in words, but looking at the code is probably more helpful at this point -- it's attached at the end of this post (limit = 0 for the default String.split() call).
In the case of " apples".split("\\w*"), the regular expression matches three times ("" at the start of the string, "apples", and "" at the end of the string -- Chandra Bhatt's analysis above isn't quite correct); so you get three fragments added to the output: "", " ", and "". The split() algorithm then adds an extra fragment to account for the remainder from the end of the last match to the end of the string: in our case, that's just the empty string again. Finally, the algorithm prunes those two empty strings ("") from the end of the array of results -- the pruning removes all the empty strings from the end of the results up to the first non-empty string. (Note: this pruning doesn't happen if you pass in a non-zero limit parameter to the split method).
In the case of "apples".split("\\w*"), the regular expression matches twice ("apples" and "" at the end) to give fragments "" and "". Another empty string is added to account for the remainder, but all three ""'s are pruned at the end, resulting in an empty array as output.
Finally, "apples ".split("\\w*"): the regular expression matches three times, "apples", "" and "", to give fragments "", "" and " ". The empty string is again added to the end of the outputs for the remainder, but is pruned off at the final stage (and that's the only one that's pruned).
It says 0 or more character/digit/"_" (underscore) meta character "*" is known as greedy (it says "I WANT MORE, COMMON")
Charsequence "apple ": Can you guess how many blank strings are their in?
BLANK STRING FINDER CODE Try the following code:
In the same way when the CharSequence is "apple " and the pattern is "\\w*"
1- The first point will be beginning of the "apple ", that is before "a" 2- The second point will be blank before space (" ") 3- The third point will be blank string after space
Matt -------------------------------------------------------------------------- ("" at the start of the string, "apples", and "" at the end of the string -- Chandra Bhatt's analysis above isn't quite correct); -------------------------------------------------------------------------- can you allobrate this please ? In the starting of apples there is space ,but how this "" is comming ,i think it has to come " "
Thanks Anil Kumar
Chandra Bhatt
Ranch Hand
Joined: Feb 28, 2007
Posts: 1707
posted
0
Matt says,
In the case of " apples".split("\\w*"), the regular expression matches three times ("" at the start of the string, "apples", and "" at the end of the string
I find the above lines missing something...
IMHO, " apples".split("\\w*"), the regular expression matches 0 occurrence in the very beginning of the " apples" and then space. By default split() skips the last blank string "", as the API says.
The second argument of the split() is helpful to tell the "limit".
Thanks, cmbhatt
anil kumar
Ranch Hand
Joined: Feb 23, 2007
Posts: 447
posted
0
Hi Chandra
The trailing empty string are removed but here it is not like that why
I am speaking about in this case i got that but see here 3) String str = "apples "; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); Output is : TokenToken TokenToken Token Token [ April 28, 2007: Message edited by: anil kumar ]
Chandra Bhatt
Ranch Hand
Joined: Feb 28, 2007
Posts: 1707
posted
0
Pattern and Matcher classes will tell the truth!!!
In case of, str="apple"; out with the above code will be >apple< ><
Got any idea???
|--------| |cmbhatt | |--------
Matt Russell
Ranch Hand
Joined: Aug 15, 2006
Posts: 165
posted
0
Anil, the following code shows where the regular expression matches. Remember: split() outputs the bits between the matches (and before and after the first and last matches respectively), but trims empty strings from the end of the output.
anil kumar
Ranch Hand
Joined: Feb 23, 2007
Posts: 447
posted
0
Hi Matt I have tried your program i have understood,But when i tried the same thing i am getting different o/p(SEE THE SPACE BETWEEN THE TWO TOKENS) Why? See below line1 This is the only thing i could not understood since morning
3) String str = "apples "; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); Output is : TokenToken TokenToken Token Token ////line1
[ April 28, 2007: Message edited by: anil kumar ] [ April 28, 2007: Message edited by: anil kumar ]
Matt Russell
Ranch Hand
Joined: Aug 15, 2006
Posts: 165
posted
0
Originally posted by anil kumar: Hi This is the only thing i could not understood since morning
3) String str = "apples "; String s[] = str.split("\\w*"); for (String i:s) System.out.println("Token" + i + "Token"); Output is : TokenToken TokenToken Token Token ////line1
OK, step 1: where does the regular expression match? The program I pasted above shows you:
Let's call them matches 1, 2 & 3.
Step 2: What are all bits before, between and after the matches? Well, before match 1 (i.e. "apples"), we have nothing, so output 1 = "". Between match 1 & match 2 we also have nothing, so output 2 = "". Between match 2 & match 3 we have a space, so output 3 = " ". Finally, after match 3 we have nothing, so output 4 = "". OK, so far we have:
Outputs: 1 = "", 2 = "", 3 = " " and 4 = "".
Step 3: Pruning: when called with no limit argument, split() removes all the empty strings at the end of the output, so this becomes:
Outputs: 1 = "", 2 = "" and 3 = " ".
(If you'd used str.split("\\w*", -1) instead, you'd get all of the strings without any pruning.)
-- Matt
anil kumar
Ranch Hand
Joined: Feb 23, 2007
Posts: 447
posted
0
Now i have understood Thanks you Matt and Chandra for your value time and response
And chandra May first week starts from tuesday so i don't know your exam date
But thanks and the all the best for your exam
Meena R. Krishnan
Ranch Hand
Joined: Aug 13, 2006
Posts: 178
posted
0
Results:
Test6's results: Looking for a word char w greedy quantified(1 or more).
>< -->Blank before 'This' >< -->Blank after 'This' > < -->Space betn 'This' and 'is' >< -->Blank after 'is' > < -->Space betn 'is' and 'to' >< --> Blank after 'to' > < --> Space betn 'to' and 'test' >< --> Blank after 'Test' >< --> ???
Matt Russell
Ranch Hand
Joined: Aug 15, 2006
Posts: 165
posted
0
Originally posted by M Krishnan:
I find it helps to view this in terms of the regular expression matches first:
If you then work out what bits of the string are before, between and after the matches, you get the same output as split(..., -1):
[ April 29, 2007: Message edited by: Matt Russell ]
Sasha Ruehmkorf
Ranch Hand
Joined: Mar 29, 2007
Posts: 115
posted
0
Matt, thanks for your explanations, they made things much clearer for me. Still there is one very special case that I do not understand:
gives output: ><
I thought trailing empty strings are discarded, so the output should be nothing... ? [ May 07, 2007: Message edited by: Sasha Ruehmkorf ]
sharan vasandani
Ranch Hand
Joined: Feb 22, 2007
Posts: 100
posted
0
In the case of " apples".split("\\w*"), the regular expression matches three times ("" at the start of the string, "apples", and "" at the end of the string -- Chandra Bhatt's analysis above isn't quite correct); so you get three fragments added to the output: "", " ", and "".
i am unable to understand how the bolded part is mathcing.please explain.
and in case of "this is to test" "this" first match "" second match ,m not understanding how this is coming.
sharan vasandani
Ranch Hand
Joined: Feb 22, 2007
Posts: 100
posted
0
System.out.println("------Test3------"); tokens = s.split("\\S",-1); //Non-White space char for(String ss :tokens) { System.out.println(">"+ss+"<"); }
System.out.println("------Test4------"); tokens = s.split("\\W",-1); //Non-word char - same as space for(String ss :tokens) { System.out.println(">"+ss+"<"); }
what is this non-white space and no -word char?
Chandra Bhatt
Ranch Hand
Joined: Feb 28, 2007
Posts: 1707
posted
0
Hi Sharan,
Blank string and space are two difference things. See this:
"apple" :- in this String literal "apple" there are six blank strings ""
All above discussion concluded that non-matching trailing blank strings are chopped of by the split method until you pass Limit as second argument to the split method.
The latest question was regarding "".split("x*"); that returns ><, I mean one blank string.
It is only the non-matching trailing blank string that is chopped off by the split method. What is returned by this is just leading blank string. What the pattern says is find 0 or more occurrence of x.
I think, I may confirm this by this example:
Example #1:
Output: >< > <
Trailing blank is chopped off.
Example #2:
Output: >< > < ><
This is because of the second argument (Limit) we have passed to the split(...) method.
I thought trailing empty strings are discarded, so the output should be nothing... ? [ May 07, 2007: Message edited by: Sasha Ruehmkorf ]
what about this issue? [ May 08, 2007: Message edited by: sharan vasandani ]
Chandra Bhatt
Ranch Hand
Joined: Feb 28, 2007
Posts: 1707
posted
0
Hi,
I think if you read the post, I have just posted above carefully, you will get that. What couple of examples I have given are just for that case only.
Thanks,
sharan vasandani
Ranch Hand
Joined: Feb 22, 2007
Posts: 100
posted
0
am sorry but its not clear to me what do you want to say by this line.
It is only the non-matching trailing blank string that is chopped off by the split method. What is returned by this is just leading blank string.
in previous post matt has said all empty strings are pruned till a non -empty string is encountered, in our case there is no non-empty string so still why its printing "><"
Chandra Bhatt
Ranch Hand
Joined: Feb 28, 2007
Posts: 1707
posted
0
We have pattern "x*" that says 0 or more occurrence of x. Remember 0 occurrence will do there too. So therefore spilt() has to return the tokens following the Pattern as a sort of delimiting sequence. I can think why confusion comes, it is because there is only blank string, but that can't be discarded by the split; what is returned by the split, we can say that is leading string (although that is trailing too (source of confusion)).
It that blank is followed by any other char literal that are constituting the string to be split, in that case only split would have chopped the un-matched trailing blanks, as I did in couple of examples in my previous post.
To get all the unmatched trailings pass the second parameter Limit negative for all or positive for the limit how many times it should be applied.
Thanks,
sharan vasandani
Ranch Hand
Joined: Feb 22, 2007
Posts: 100
posted
0
stil not clear.
according to mat
Finally, the algorithm prunes those two empty strings ("") from the end of the array of results -- the pruning removes all the empty strings from the end of the results up to the first non-empty string
all empty strings are removed until a non-empty string is encountered.
Chandra Bhatt
Ranch Hand
Joined: Feb 28, 2007
Posts: 1707
posted
0
Hi Sharan,
No issue to worry about.
Anyways, what do you think about this issue; how is this done? I think you should just manipulate the code, try using several modifications, split with second argument, with some positive values, -1 and all. You tell me how the things are happening there.
This is far better way as I think.
Keep it up!
Thanks,
sharan vasandani
Ranch Hand
Joined: Feb 22, 2007
Posts: 100
posted
0
i know passing -1 will not prune any empty strings but will print them all.
but am confused between these two,
In the case of "apples".split("\\w*"), the regular expression matches twice ("apples" and "" at the end) to give fragments "" and "". Another empty string is added to account for the remainder, but all three ""'s are pruned at the end, resulting in an empty array as output.
String [] tokens = "".split("x*");for (String s : tokens) System.out.print(">" + s + "<");
I thought trailing empty strings are discarded, so the output should be nothing... ?
It looks like you may have found a bug -- or at least, an undocumented exception condition. From the source code, it looks like if there are *no* matches for the delimiter, it will just return the original string as an array of size one.
It doesn't even bother to check to limit parameter, or call the part that removes the trailing blanks.
Oops, I was wrong. This exception condition is documented in the JavaDoc...
If this pattern does not match any subsequence of the input then the resulting array has just one element, namely the input sequence in string form.
It looks like if there are no matches for the split delimiter, then the limit part of split (and any side effects) is not even applied.
Henry
Matt Russell
Ranch Hand
Joined: Aug 15, 2006
Posts: 165
posted
0
Hmm. With regards to "".split("x*"), this may be either a bug or just undefined behaviour -- interestingly, I get different results with Sun's libraries than with GNU Java.
This is probably not something that's tested on SCJP ;-)
With Sun's JDK, I get:
With GNU Java, I get
What I think is happening is as follows: the split() JavaDoc says that, "If this pattern does not match any subsequence of the input then the resulting array has just one element, namely the input sequence in string form." However, this is implemented in Sun's code (pasted a few messages back) by testing if the index variable == 0. That would normally indicate no matches had occurred, however, it's also the case where the string itself is empty and there is a zero-length match.
My suspicion is that this is a Sun bug, in that the spec states that trailing empty strings will be discarded.
-- Matt [ May 08, 2007: Message edited by: Matt Russell ]
Matt Russell
Ranch Hand
Joined: Aug 15, 2006
Posts: 165
posted
0
Originally posted by Henry Wong: Oops, I was wrong. This exception condition is documented in the JavaDoc... It looks like if there are no matches for the split delimiter, then the limit part of split (and any side effects) is not even applied. Henry
Sure...but in the case of "".split("x*"), there is one match.
Sure...but in the case of "".split("x*"), there is one match.
Actually, no. I was referring to the matching of the delimiter, not the results. There are no x's to match.
Henry
Matt Russell
Ranch Hand
Joined: Aug 15, 2006
Posts: 165
posted
0
Originally posted by Henry Wong: Actually, no. I was referring to the matching of the delimiter, not the results. There are no x's to match. Henry
I was referring to the matching of the delimiter too -- * matches 0 or more: so x* matches even though there are no x's to match. It's quite possible I'm being dense and missing something, though ;-)
Sasha Ruehmkorf
Ranch Hand
Joined: Mar 29, 2007
Posts: 115
posted
0
It's quite possible I'm being dense and missing something, though ;-)
Don't think so. My Test-Program gives: Pattern = x* Matcher = "" I found the text "" starting at index 0 and ending at index 0.
So, thank you very much for the fruitful discussion. Finally I feel like being able to predict the output of the split-method() in absolutely every case. :-) Unfortunately I am not able to state the same for all these parse-Methods around. Lots of work still to be done...
I was referring to the matching of the delimiter too -- * matches 0 or more: so x* matches even though there are no x's to match. It's quite possible I'm being dense and missing something, though ;-)
Interesting. You are absolutely correct.
From the source code, it does look like a bug. Apparently, it is checking to see if an internal variable (index) is not changed (to determine no matches). This variable starts of as zero, and ends up as the end of the last match -- which in this case is still zero.
Henry
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to
run our stuff on 16 servers instead of 3.