Two Laptop Bag*
The moose likes Java in General and the fly likes Regular Expressions in String.split() Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regular Expressions in String.split()" Watch "Regular Expressions in String.split()" New topic
Author

Regular Expressions in String.split()

Joe carco
Ranch Hand

Joined: Apr 14, 2009
Posts: 82
Hi,

I want to split a String using a character | as the split-token. However when this token ist escaped with '\' i.e '\|' then the split should not take place at that point.
For example:

will split the string into 4 parts.
But how do prevent the second "bla" being split from the third bla?

The result should look like this:


I've tried all sorts of regexp combinations. None of them did what I expected.
I've spent hours on this problem and I'm about to give up.
I would really appreciate any help.

thank you in advance- Carcophan
Miklos Szeles
Ranch Hand

Joined: Oct 21, 2008
Posts: 142
Maybe it's not as efficient as using regexps, but why don't you write your own simple splitter in a few minutes?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19684
    
  20

First of all, I do not believe you if you say that using "|" will work. The vertical bar is a meta character in regular expressions. Also, your two lines of code won't even compile, as the \ should be escaped.Post Real Code.

In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Sebastian Janisch
Ranch Hand

Joined: Feb 23, 2009
Posts: 1183
I came up with this but for some reason instead of returning

[bla, bla\|bla, bla]

it returns

[bl, bla\|bl, bla]

which is close but where are the two a letters?


JDBCSupport - An easy to use, light-weight JDBC framework -
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18765
    
  40

which is close but where are the two a letters?


The two a letters were used as part of the delimiters.

Try...




BTW, to the original poster... I would be hesitant on using any of the solutions posted in this topic. From your question, it looks like you are a beginner with regexes, and is unlikely to understand the solution posted. And it is never a good idea to use something that you don't understand.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Sebastian Janisch
Ranch Hand

Joined: Feb 23, 2009
Posts: 1183
Henry Wong wrote:
which is close but where are the two a letters?


The two a letters were used as part of the delimiters.

Try...



Henry>



@Joe Carco ... This is what you want. ..
Joe carco
Ranch Hand

Joined: Apr 14, 2009
Posts: 82
Rob Prime wrote:First of all, I do not believe you if you say that using "|" will work. The vertical bar is a meta character in regular expressions. Also, your two lines of code won't even compile, as the \ should be escaped.Post Real Code.

In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.


ok I admit its not real code but the problem is real. The input I'm parsing does in fact have the pipe "|" as a segment marker that needs to be split, and a "\" as an escape character. I decided not to copy/paste any code but just post the code from my memory.

@Henry, Thank you so much for your help. I didn't want to wite my own String splitter becuase I was certain that it could be done with regular expressions
Sebastian Janisch
Ranch Hand

Joined: Feb 23, 2009
Posts: 1183
@Henry, Thank you so much for your help. I didn't want to wite my own String splitter becuase I was certain that it could be done with regular expressions


Note though that your custom splitter could be faster than employing the heavy regex engine.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18765
    
  40

Sebastian Janisch wrote:
@Henry, Thank you so much for your help. I didn't want to wite my own String splitter becuase I was certain that it could be done with regular expressions


Note though that your custom splitter could be faster than employing the heavy regex engine.


Also, it could have been done in 15 minutes. Instead, you "spent hours on this problem", gave up, got the solution here, and now, have a solution that you don't understand. It that really better?

Henry
Joe carco
Ranch Hand

Joined: Apr 14, 2009
Posts: 82
yes it is. writing a custom splitter would have been a boring task. now i had the chance to delve into regex a bit. learnt something new today! cheers
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Sebastian Janisch wrote:
Henry Wong wrote:
which is close but where are the two a letters?


The two a letters were used as part of the delimiters.

Try...



Henry>



@Joe Carco ... This is what you want. ..


It will not work if a backslash is part of the text and comes just before the pipe character:

Of course, this might never occur in the OP's input... But if it is possible, the OP should devise a different solution (or a bit more tricky split(...) regex).
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18765
    
  40

It will not work if a backslash is part of the text and comes just before the pipe character:


Not exactly sure what you mean. Isn't this what the OP wanted? To not split when the pipe character is escaped?

Henry
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Henry Wong wrote:
It will not work if a backslash is part of the text and comes just before the pipe character:


Not exactly sure what you mean. Isn't this what the OP wanted? To not split when the pipe character is escaped?

Henry


Say the OP wants to split on the unescaped pipe:

and

But what if the text can contain a backslash that is not used to escape the pipe symbol? A natural choice would be to escape that backslash like this:

The solution(s) proposed in this thread will also split on the pipe before 'c' while that might not be the OP intention.
But, like I said: this might very well not occur in the OP's input, but if it can occur, I thought I'd just mention it.

In short: the OP might be looking for a way to split on the pipe only if the pipe has an uneven number of backslashes before it.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18765
    
  40

In short: the OP might be looking for a way to split on the pipe only if the pipe has an uneven number of backslashes before it.


Interesting. I never even saw it. Good catch.

Henry
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19684
    
  20

Which I already mentioned:
Rob Prime wrote:In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Rob Prime wrote:Which I already mentioned:
Rob Prime wrote:In the JavaDoc of java.util.regex.Pattern, do a search for lookbehind and you should find a solution. Of course this solution will again break if you do want to split at \\|, then again not at \\\| etc. That's going to be quite a bit harder.


Indeed, missed your response!
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18765
    
  40

Hmmm.... How about this for checking odd number of backslashes?



Unfortunately, this can only check for odd number of backslashes up til 2001 backslashes, then it breaks...


To Joe Carco, don't you wish you wrote your own "boring" regex instead, now? ...

Henry>
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Henry Wong wrote:Hmmm.... How about this for checking odd number of backslashes?



Unfortunately, this can only check for odd number of backslashes up til 2001 backslashes, then it breaks...


To Joe Carco, don't you wish you wrote your own "boring" regex instead, now? ...

Henry>




You could always replace 1000 with Integer.MAX_VALUE
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19684
    
  20

Make that Long.MAX_VALUE just to be sure.
Joe carco
Ranch Hand

Joined: Apr 14, 2009
Posts: 82



FREAK!
 
 
subject: Regular Expressions in String.split()