Meaningless Drivel is fun!*
The moose likes Java in General and the fly likes Split string on a word not just a character Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Split string on a word not just a character" Watch "Split string on a word not just a character" New topic
Author

Split string on a word not just a character

Theodore David Williams
Ranch Hand

Joined: Dec 21, 2009
Posts: 102
Is there a way to split a string on a word.

i.e.
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
The "[]" indicate a regular expression, and means use any character inside the brackets as the delimiter.
You might try s.split("the").
Theodore David Williams
Ranch Hand

Joined: Dec 21, 2009
Posts: 102
Yeah that works thanks. I still have a problem in that I want to split on multiple words and characters. And I also want to ignore case
I.E. can I split on the words and characters below?
'the', 'The'
'to', 'To', 'TO'
','
'/'
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19682
    
  20

Put (i) before the regular expression. This is a flag that indicates the regular expression should ignore the case. To add multiple words use the symbol:


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7709
    
  20

Theodore David Williams wrote:Yeah that works thanks. I still have a problem in that I want to split on multiple words and characters. And I also want to ignore case
I.E. can I split on the words and characters below?'...

One possibility is not to try to do everything at once. Regexes are good, but they're not all-powerful, and trying to incorporate every possible rule into one is likely to make for a very long and complicated pattern (and will probably lead to more mistakes).
What about this:
1. Use String.split("\\s+") to split the string into whitespace-delimited "words".
2. Elimiinate "punctuation" with a String.replaceAll() pattern.
3. Use String.equalsIngnoreCase() to find the words you want to eliminate and pull out the words between them.

It will probably be slower, but we're likely talking fractions of seconds, and the resulting code will be a lot easier to change if you need to, and much more self-documenting.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Matthew Brown
Bartender

Joined: Apr 06, 2010
Posts: 4374
    
    8

Just to give a further example - the regex you've got so far will also split on the "word" "the" in "other" or "thesaurus". Yes, you can revise the expression further to cope with that, but Winston's advice is sensible.
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
You could put whitespace in your regex in order to split on the words. \s means "any whitespace (tab, newline, space, new paragraph etc) character.
s = s.split("\sthe\s");

Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19682
    
  20

To also allow "the" at the start and end of the String, make that
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7709
    
  20

Rob Spoor wrote:To also allow "the" at the start and end of the String, make that

And if you want to allow for more than one whitespace character, you might need:
split("(\\s+|^)the(\\s+|$)")
and you may need to worry about whether you use greedy or reluctant qualifiers (to be honest, I don't know if it makes any difference).

@Theodore: And the above pattern is just for one word. Do you see what I mean now about complexity?

Winston

 
Don't get me started about those stupid light bulbs.
 
subject: Split string on a word not just a character