• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regular expressions and the split() method

 
Sidharth Khattri
Ranch Hand
Posts: 125
Java Linux Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I thought I could ask here instead of making a new thread. I'll make a new thread if I get no response :/
I still don't understand the concept.

I wrote this little program:


Here are few of the outputs from the following command line invocations:
1) With java LogSplitter "a" "\w"
Output:

0

2) With java LogSplitter "a " "\w"
Output:
><> <
2

Now, why does the second invocation return an empty token between -1 and 0 along with space following a in "a "
and the first invocation doesn't return an empty invocation between -1 and 0?

3) Although with java LogSplitter "a" "\d"
Output is:
>a<
1
why does it return the token >a< even when there's no digit in the string? And it returned 0 in the first invocation?

4) With java LogSplitter "" "\w"
Output:
><
1
why does it return an empty string when there's nothing in the string?

5) With java LogSplitter " a" "\s"
Output:
><><>a<
3
What's up when using "\s"?

WHAT IS THE LOGIC BEHIND SPLIT?
 
Henry Wong
author
Marshal
Pie
Posts: 20883
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sidharth Khattri wrote:I thought I could ask here instead of making a new thread. I'll make a new thread if I get no response :/
I still don't understand the concept.


Yea, let's move this to a new topic instead of confusing the other topic. Also, you no longer have to wait for "no response".

Henry
 
Sidharth Khattri
Ranch Hand
Posts: 125
Java Linux Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
Sidharth Khattri wrote:I thought I could ask here instead of making a new thread. I'll make a new thread if I get no response :/
I still don't understand the concept.


Yea, let's move this to a new topic instead of confusing the other topic. Also, you no longer have to wait for "no response".

Henry


Thank you for moving this to a new thread. I never wanted to confuse anyone though.
Would love to get a response
 
Henry Wong
author
Marshal
Pie
Posts: 20883
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sidharth Khattri wrote:I thought I could ask here instead of making a new thread. I'll make a new thread if I get no response :/
I still don't understand the concept.

I wrote this little program:


Here are few of the outputs from the following command line invocations:
1) With java LogSplitter "a" "\w"
Output:

0


A regex of "\w" is a word character -- so a single word character is the delimiter (when using split). With a string of "a", the letter "a" is the delimiter -- yielding two components which are both zero length strings.

However, with the version of split(), that takes a single string (the delimiter), all trailing zero-length parts are removed. This means that there are no components after the split.

Henry
 
Henry Wong
author
Marshal
Pie
Posts: 20883
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sidharth Khattri wrote:
2) With java LogSplitter "a " "\w"
Output:
><> <
2

Now, why does the second invocation return an empty token between -1 and 0 along with space following a in "a "
and the first invocation doesn't return an empty invocation between -1 and 0?


A regex of "\w" is a word character -- so a single word character is the delimiter (when using split). With a string of "a", the letter "a" is the delimiter -- yielding two components. The first is a zero length string, and the second is a single space.

Henry
 
Henry Wong
author
Marshal
Pie
Posts: 20883
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sidharth Khattri wrote:
3) Although with java LogSplitter "a" "\d"
Output is:
>a<
1
why does it return the token >a< even when there's no digit in the string? And it returned 0 in the first invocation?


A regex of "\d" is a digit character -- so a single numeric digit is the delimiter (when using split). With a string of "a", there are *no* matches, hence, no delimiters -- nothing to split. The result is the original string -- without any splitting done.

Henry
 
Sidharth Khattri
Ranch Hand
Posts: 125
Java Linux Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry, I finally got it
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic