Meaningless Drivel is fun!*
The moose likes Beginning Java and the fly likes Tokenizing with regex pattern. Little confused! Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Tokenizing with regex pattern. Little confused!" Watch "Tokenizing with regex pattern. Little confused!" New topic
Author

Tokenizing with regex pattern. Little confused!

Keith Nagle
Ranch Hand

Joined: May 06, 2008
Posts: 65
Im using a regex pattern to tokenize a String.
The code runs fine but Im curious about the output.
Here's the code:
My code prints brackets around the output to allow for whitespaces.
Here is my command line invocation where args[0] is the regex pattern to be used and args[1] is the source String:
java Test2 "\d*" "cY 39r k"
The output was:
Token: ><
Token: >c<
Token: >Y<
Token: > <
Token: ><
Token: >r<
Token: > <
Token: >k<

Am I right in saying, that at cell 0, a 'c' resides, which is a delimiter as it is not a digit so an empty String >< is printed. Cell 1 contains 'Y' which is a delimiter as it is not a digit, so >c< is printed. Then in cell 2 a whitespace resides, which is not a digit, so it therefore counts as a delimiter. but why isn't >cY< printed? Here it prints a whitespace > < which is the delimiter. I would have thought >cY< would be printed.
I read the Java tutorial on searching using Regex and if it was a search I can understand that (off the top of my head) the output would be:
"" @ start index 0 and end index 0
"" @ start index 1 end index 1
"" @ start 2 end 2
39 @ start 3 end 5
"" @ start 5 end 5
"" @ start 6 end 6
"" @ start 7 end 7
"" @ start 8 end 8

I just dont understand what's going on when using the above regex expression as a delimiter when tokenizing.
Please help!
Thank you
[ June 24, 2008: Message edited by: Keith Nagle ]

SCJP 5.0
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18916
    
  40

Your regex pattern for the delimiter is zero or more digits. This means that an empty string (zero digits) is a valid delimiter.

Am I right in saying, that at cell 0, a 'c' resides, which is a delimiter as it is not a digit so an empty String >< is printed. Cell 1 contains 'Y' which is a delimiter as it is not a digit, so >c< is printed. Then in cell 2 a whitespace resides, which is not a digit, so it therefore counts as a delimiter. but why isn't >cY< printed? Here it prints a whitespace > < which is the delimiter. I would have thought >cY< would be printed.
I read the Java tutorial on searching using Regex and if it was a search I can understand that (off the top of my head) the output would be:


Basically, you have an empty string delimiter before the first character, which is why the first value is an empty string. You have an empty string delimiter between the first and second character, which is why the second value is a "c" -- the value between the first and second delimiters. You have an empty string delimiter between the second and third character, which is why the second value is a "Y" -- the value between the second and third delimiters.

The values are between the delimiters -- they are not indpendent of each other.

Henry
[ June 24, 2008: Message edited by: Henry Wong ]

Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Darryl Burke
Bartender

Joined: May 03, 2008
Posts: 4642
    
    5

Keith, it's rather rude of you not to tell us here that this question has already been answered on the Sun Java forum 16 hours ago.

Confused about Tokenizing with Regex


luck, db
There are no new questions, but there may be new answers.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39478
    
  28
Originally posted by Darryl Burke:
This question has already been answered on the Sun Java forum 16 hours ago.
Read this FAQ, please.
Darryl Burke
Bartender

Joined: May 03, 2008
Posts: 4642
    
    5

Umm, till I clicked the link I thought that was directed at me :roll:
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39478
    
  28
Originally posted by Darryl Burke:
Umm, till I clicked the link I thought that was directed at me :roll:
Sorry.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Tokenizing with regex pattern. Little confused!