The "" indicate a regular expression, and means use any character inside the brackets as the delimiter.
You might try s.split("the").
Theodore David Williams
Joined: Dec 21, 2009
Yeah that works thanks. I still have a problem in that I want to split on multiple words and characters. And I also want to ignore case
I.E. can I split on the words and characters below?
'to', 'To', 'TO'
Theodore David Williams wrote:Yeah that works thanks. I still have a problem in that I want to split on multiple words and characters. And I also want to ignore case
I.E. can I split on the words and characters below?'...
One possibility is not to try to do everything at once. Regexes are good, but they're not all-powerful, and trying to incorporate every possible rule into one is likely to make for a very long and complicated pattern (and will probably lead to more mistakes).
What about this:
1. Use String.split("\\s+") to split the string into whitespace-delimited "words".
2. Elimiinate "punctuation" with a String.replaceAll() pattern.
3. Use String.equalsIngnoreCase() to find the words you want to eliminate and pull out the words between them.
It will probably be slower, but we're likely talking fractions of seconds, and the resulting code will be a lot easier to change if you need to, and much more self-documenting.
Bats fly at night, 'cause they aren't we. And if we tried, we'd hit a tree -- Ogden Nash (or should've been).
Articles by Winston can be found here
Just to give a further example - the regex you've got so far will also split on the "word" "the" in "other" or "thesaurus". Yes, you can revise the expression further to cope with that, but Winston's advice is sensible.
Joined: Sep 26, 2010
You could put whitespace in your regex in order to split on the words. \s means "any whitespace (tab, newline, space, new paragraph etc) character.
s = s.split("\sthe\s");
Rob Spoor wrote:To also allow "the" at the start and end of the String, make that
And if you want to allow for more than one whitespace character, you might need:
split("(\\s+|^)the(\\s+|$)") and you may need to worry about whether you use greedy or reluctant qualifiers (to be honest, I don't know if it makes any difference).
@Theodore: And the above pattern is just for one word. Do you see what I mean now about complexity?