This week's book giveaway is in the Design forum.
We're giving away four copies of Design for the Mind and have Victor S. Yocco on-line!
See this thread for details.
Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regular Expressions in String.split()

 
Joe carco
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I want to split a String using a character | as the split-token. However when this token ist escaped with '\' i.e '\|' then the split should not take place at that point.
For example:

will split the string into 4 parts.
But how do prevent the second "bla" being split from the third bla?

The result should look like this:


I've tried all sorts of regexp combinations. None of them did what I expected.
I've spent hours on this problem and I'm about to give up.
I would really appreciate any help.

thank you in advance- Carcophan
 
Miklos Szeles
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maybe it's not as efficient as using regexps, but why don't you write your own simple splitter in a few minutes?
 
Rob Spoor
Sheriff
Pie
Posts: 20510
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First of all, I do not believe you if you say that using "|" will work. The vertical bar is a meta character in regular expressions. Also, your two lines of code won't even compile, as the \ should be escaped.Post Real Code.

In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.
 
Sebastian Janisch
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I came up with this but for some reason instead of returning

[bla, bla\|bla, bla]

it returns

[bl, bla\|bl, bla]

which is close but where are the two a letters?
 
Henry Wong
author
Marshal
Pie
Posts: 20989
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
which is close but where are the two a letters?


The two a letters were used as part of the delimiters.

Try...




BTW, to the original poster... I would be hesitant on using any of the solutions posted in this topic. From your question, it looks like you are a beginner with regexes, and is unlikely to understand the solution posted. And it is never a good idea to use something that you don't understand.

Henry
 
Sebastian Janisch
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
which is close but where are the two a letters?


The two a letters were used as part of the delimiters.

Try...



Henry>



@Joe Carco ... This is what you want. ..
 
Joe carco
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote:First of all, I do not believe you if you say that using "|" will work. The vertical bar is a meta character in regular expressions. Also, your two lines of code won't even compile, as the \ should be escaped.Post Real Code.

In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.


ok I admit its not real code but the problem is real. The input I'm parsing does in fact have the pipe "|" as a segment marker that needs to be split, and a "\" as an escape character. I decided not to copy/paste any code but just post the code from my memory.

@Henry, Thank you so much for your help. I didn't want to wite my own String splitter becuase I was certain that it could be done with regular expressions
 
Sebastian Janisch
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
@Henry, Thank you so much for your help. I didn't want to wite my own String splitter becuase I was certain that it could be done with regular expressions


Note though that your custom splitter could be faster than employing the heavy regex engine.
 
Henry Wong
author
Marshal
Pie
Posts: 20989
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sebastian Janisch wrote:
@Henry, Thank you so much for your help. I didn't want to wite my own String splitter becuase I was certain that it could be done with regular expressions


Note though that your custom splitter could be faster than employing the heavy regex engine.


Also, it could have been done in 15 minutes. Instead, you "spent hours on this problem", gave up, got the solution here, and now, have a solution that you don't understand. It that really better?

Henry
 
Joe carco
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
yes it is. writing a custom splitter would have been a boring task. now i had the chance to delve into regex a bit. learnt something new today! cheers
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sebastian Janisch wrote:
Henry Wong wrote:
which is close but where are the two a letters?


The two a letters were used as part of the delimiters.

Try...



Henry>



@Joe Carco ... This is what you want. ..


It will not work if a backslash is part of the text and comes just before the pipe character:

Of course, this might never occur in the OP's input... But if it is possible, the OP should devise a different solution (or a bit more tricky split(...) regex).
 
Henry Wong
author
Marshal
Pie
Posts: 20989
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It will not work if a backslash is part of the text and comes just before the pipe character:


Not exactly sure what you mean. Isn't this what the OP wanted? To not split when the pipe character is escaped?

Henry
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
It will not work if a backslash is part of the text and comes just before the pipe character:


Not exactly sure what you mean. Isn't this what the OP wanted? To not split when the pipe character is escaped?

Henry


Say the OP wants to split on the unescaped pipe:

and

But what if the text can contain a backslash that is not used to escape the pipe symbol? A natural choice would be to escape that backslash like this:

The solution(s) proposed in this thread will also split on the pipe before 'c' while that might not be the OP intention.
But, like I said: this might very well not occur in the OP's input, but if it can occur, I thought I'd just mention it.

In short: the OP might be looking for a way to split on the pipe only if the pipe has an uneven number of backslashes before it.
 
Henry Wong
author
Marshal
Pie
Posts: 20989
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In short: the OP might be looking for a way to split on the pipe only if the pipe has an uneven number of backslashes before it.


Interesting. I never even saw it. Good catch.

Henry
 
Rob Spoor
Sheriff
Pie
Posts: 20510
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Which I already mentioned:
Rob Prime wrote:In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote:Which I already mentioned:
Rob Prime wrote:In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.


Indeed, missed your response!
 
Henry Wong
author
Marshal
Pie
Posts: 20989
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hmmm.... How about this for checking odd number of backslashes?



Unfortunately, this can only check for odd number of backslashes up til 2001 backslashes, then it breaks...


To Joe Carco, don't you wish you wrote your own "boring" regex instead, now? ...

Henry>
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:Hmmm.... How about this for checking odd number of backslashes?



Unfortunately, this can only check for odd number of backslashes up til 2001 backslashes, then it breaks...


To Joe Carco, don't you wish you wrote your own "boring" regex instead, now? ...

Henry>




You could always replace 1000 with Integer.MAX_VALUE
 
Rob Spoor
Sheriff
Pie
Posts: 20510
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Make that Long.MAX_VALUE just to be sure.
 
Joe carco
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator



FREAK!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic