This week's book giveaway is in the Java 8 forum.
We're giving away four copies of Java 8 in Action and have Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft on-line!
See this thread for details.
The moose likes Java in General and the fly likes Split string on a word not just a character Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Split string on a word not just a character" Watch "Split string on a word not just a character" New topic
Author

Split string on a word not just a character

Theodore David Williams
Ranch Hand

Joined: Dec 21, 2009
Posts: 102
Is there a way to split a string on a word.

i.e.
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
The "[]" indicate a regular expression, and means use any character inside the brackets as the delimiter.
You might try s.split("the").
Theodore David Williams
Ranch Hand

Joined: Dec 21, 2009
Posts: 102
Yeah that works thanks. I still have a problem in that I want to split on multiple words and characters. And I also want to ignore case
I.E. can I split on the words and characters below?
'the', 'The'
'to', 'To', 'TO'
','
'/'
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19552
    
  16

Put (i) before the regular expression. This is a flag that indicates the regular expression should ignore the case. To add multiple words use the symbol:


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7081
    
  16

Theodore David Williams wrote:Yeah that works thanks. I still have a problem in that I want to split on multiple words and characters. And I also want to ignore case
I.E. can I split on the words and characters below?'...

One possibility is not to try to do everything at once. Regexes are good, but they're not all-powerful, and trying to incorporate every possible rule into one is likely to make for a very long and complicated pattern (and will probably lead to more mistakes).
What about this:
1. Use String.split("\\s+") to split the string into whitespace-delimited "words".
2. Elimiinate "punctuation" with a String.replaceAll() pattern.
3. Use String.equalsIngnoreCase() to find the words you want to eliminate and pull out the words between them.

It will probably be slower, but we're likely talking fractions of seconds, and the resulting code will be a lot easier to change if you need to, and much more self-documenting.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
Matthew Brown
Bartender

Joined: Apr 06, 2010
Posts: 4244
    
    7

Just to give a further example - the regex you've got so far will also split on the "word" "the" in "other" or "thesaurus". Yes, you can revise the expression further to cope with that, but Winston's advice is sensible.
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
You could put whitespace in your regex in order to split on the words. \s means "any whitespace (tab, newline, space, new paragraph etc) character.
s = s.split("\sthe\s");

Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19552
    
  16

To also allow "the" at the start and end of the String, make that
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7081
    
  16

Rob Spoor wrote:To also allow "the" at the start and end of the String, make that

And if you want to allow for more than one whitespace character, you might need:
split("(\\s+|^)the(\\s+|$)")
and you may need to worry about whether you use greedy or reluctant qualifiers (to be honest, I don't know if it makes any difference).

@Theodore: And the above pattern is just for one word. Do you see what I mean now about complexity?

Winston

 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Split string on a word not just a character
 
Similar Threads
How to reverse a string, skiping numbers , and ' ?
StringTokenizer
Problems with counting short and long words
Some beautiful Quotes
String normalize?