This week's book giveaway is in the Jobs Discussion forum.
We're giving away four copies of Soft Skills and have John Sonmez on-line!
See this thread for details.
The moose likes Java in General and the fly likes How to split string but keep all delimiters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "How to split string but keep all delimiters" Watch "How to split string but keep all delimiters" New topic
Author

How to split string but keep all delimiters

Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi All,

I need your regular expression skill to help with finetuning this Java String.split("(?=\\b[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?])" that is not retaining all the delimiter correctly. Below is the type of input string used:



I am looking for a clean as simple solution instead of with StringTokenizer or LinkedList. Would finetuning the regular expression achieve the objective? Otherwise, please advice on other possible better solution.

The examples available are either messy or not suitable to this requirement.

Thanks in advance,

Jack
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

It is not obvious to me what the split criteria is; your regex certainly does not provide the rule since it fails to do what you want. Is the general rule it to split before a decimal, then before decimal again and then before $ ? If not, you need to define the rule.

Also, testing with a single test case does not allow one to have confidence in the resulting regex.

Edit: Regular expression use "[ just about anything ]" to define a character set so in ("(?=\\b[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?]" you have a character set of "[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?]" ! Are you expecting the '[' and ']' to in some way group the content?


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
Greg Brannon
Bartender

Joined: Oct 24, 2010
Posts: 563
Are you making it harder than it has to be?

Your desired output simply breaks the example string into 4 parts between desired spaces:

between the beginning and the second space,
between the second space and the fifth space,
between the fifth space and the eighth space, and
between the eight space and the end.

That's easy enough to do by determining the location of the separating spaces and breaking the string accordingly.

Always learning Java, currently using Eclipse on Fedora.
Linux user#: 501795
Luigi Plinge
Ranch Hand

Joined: Jan 06, 2011
Posts: 441

OP, it's not really clear what exactly your split criteria are. It would help if you tried to describe it in words for any address.

If your intention is "start a new line when the first character of a word is a number or a $", I'd do something like this:It's a bit longer, but it more understandable and maintainable than an unintelligible regex, and actually works...

Or here I tried regex:
Or how about a recursive function:
Luigi Plinge
Ranch Hand

Joined: Jan 06, 2011
Posts: 441

Here's a more general method that works smilarly to String.split, which you can cut out and keep, paste into your class, add to your toolkit...
In your case you could do(The 1 is because we want the number part to the matcher group to appear at the start of the following string, rather than with the space at the end of the previous one.)
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

The OP has not yet defined the syntax of the data and the criteria then needed to perform the split. The original regex is most definitely badly flawed and the fact that the Pattern class does not throw an exception is just luck. Until the OP defines his requirement we are only guessing but I suspect all this proposed extra code is over elaborate. My best guess is that all the OP needs is

Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi James,

You are a champion! That is it. Well guessed. Below is the output I was looking for:



Excellent. Would you mind explain how (?=[0-9$]) works?

Thank you very much,

Jack
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Jack Bush wrote:
Would you mind explain how (?=[0-9$]) works?


http://www.regular-expressions.info/lookaround.html
Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi Greg & Luigi,

Thank you for your detail suggestion but it wasn't what I was after.

Cheers,

Jack
 
 
subject: How to split string but keep all delimiters