This week's book giveaway is in the OCMJEA forum.
We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line!
See this thread for details.
The moose likes Java in General and the fly likes String --> split/regex question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "String --> split/regex question" Watch "String --> split/regex question" New topic
Author

String --> split/regex question

Jamie Robertson
Ranch Hand

Joined: Jul 09, 2001
Posts: 1879

problem: split the String by any whitespace OR comma but the resulting String array does not any empty String values
Currently I have

which produces this close, but undesireable result:
1|Hi|
2|there|
3|| //don't want
4|my|
5|name|
6|is|
7|Robertson|
8|| //don't want
9|| //don't want
10|| //don't want
11|Jamie|
The result I'm looking for:
1|Hi|
2|there|
3|my|
4|name|
5|is|
6|Robertson|
7|Jamie|
Note** I'd like to solve this using the regex pattern to return the desired result, not a bolt on method that strips the array of any empty Strings.
Thanks, Jamie
[ June 05, 2003: Message edited by: Jamie Robertson ]
John Lee
Ranch Hand

Joined: Aug 05, 2001
Posts: 2545
hi:
please check out:
Interface com.oroinc.text.regex.Pattern;
Uses of Class java.util.regex.Pattern;
Help needed with Regex Pattern;
Jamie Robertson
Ranch Hand

Joined: Jul 09, 2001
Posts: 1879

Don,
I've read the tutorials and gave it my best shot ( as my post has shown ) and have the right pattern to get the desired result. I'm just looking to see if it can be tweaked to eliminate the empty Strings returned by the split method. I was thinking maybe by matching not just a comma or a whitespace, but any number of whitespaces or a comma followed by zero or any number of whitespaces...hmmm that gives me an idea...
thanx anyways
Jamie
Eric Pascarello
author
Rancher

Joined: Nov 08, 2001
Posts: 15376
    
    6
You should know, I know nothing about Java
s = s.replaceAll("\s\s\s", "\s");
s = s.replaceAll("\s\s", "\s");

see if that works, I have no clue (well I have a slight clue). If it were in JavaScript, I could have it answered in a second. Eric
Jamie Robertson
Ranch Hand

Joined: Jul 09, 2001
Posts: 1879

thanks Eric, it's that would work if I can't find the pattern I'm looking for.
I thought that something like
String[] split = s.split("[\\s+|,\\s*]");
would do, but oviously I'm not using it properly.
Jamie
John Lee
Ranch Hand

Joined: Aug 05, 2001
Posts: 2545
The string "boo:and:foo", for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" }
Jamie Robertson
Ranch Hand

Joined: Jul 09, 2001
Posts: 1879

I don't see the connection Don. Show me how this satisfies my desired output, because I don't think it does.
*for the record, I have read the javadocs for the String and Pattern classes, and have looked at the tutorial so you don't have to post anymore references to tutorials or the docs, cause I've read them!
Jamie
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I thought that something like
String[] split = s.split("[\\s+|,\\s*]");
would do, but oviously I'm not using it properly.

You're trying to use quantifiers (* and +) inside a character class (the braces []) - that doesn't work. Neither does the alternation |. Inside braces, symbols mean completely different things - most commonly, their literal char value. Except for ^ and -. Instead try something like
String[] split = s.split("[\\s,]+");
The character class [\\s,] is any whitespace or char character, and the + is outside the char class, so now it should behave like you expect. Alternately perhaps you want
String[] split = s.split("[\\s,]\\s*");
which will grabe as many trailing spaces as you want, but only one comma. (Based on some of your previous attempts.) This would create an empty string for something like
Alfa,,Charlie
but not for

It's hard to be sure what you want without seeing the input as well as the desired output.
[ June 05, 2003: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Jamie Robertson
Ranch Hand

Joined: Jul 09, 2001
Posts: 1879

It's hard to be sure what you want without seeing the input as well as the desired output.
[ June 05, 2003: Message edited by: Jim Yingst ][/QB]

Can't believe I forgot it! It was something like

and I want the result to only have an array of non-empty Strings ( words only )
Jamie
Jamie Robertson
Ranch Hand

Joined: Jul 09, 2001
Posts: 1879

Jim, thanks
worked exactly as I wanted. I'll have a re-read of your post and read some more on this particular regex that I needed. I hate not understanding something that I use in a program.
Thanx again,
Jamie
by the way, is there anything you don't know Jim??
John Lee
Ranch Hand

Joined: Aug 05, 2001
Posts: 2545
hi:
i would use split twice: the first time, create a array, then transfer it back to string, now you get rid of either "," or "<space>". then split it again to create desired result.
Frank Carver
Sheriff

Joined: Jan 07, 1999
Posts: 6920
and I want the result to only have an array of non-empty Strings ( words only )
Which begs the question, what about other punctuation and non-alpabetics? Can you clarify what constitutes a "word"?
If I give you the realistic (albeit colloquial)
"Hi! My Name? Frank; of course. (here's my resum´┐Ż) "
What words would you expect to see in the resulting array?
Is there any way of "escaping" a space or comma so that it will appear in the outout?


Read about me at frankcarver.me ~ Raspberry Alpha Omega ~ Frank's Punchbarrel Blog
Jamie Robertson
Ranch Hand

Joined: Jul 09, 2001
Posts: 1879

Frank, this is actually parsing an address line so other punctuation should be included in the words.
Sample Lines:
"THUNDER BAY,ON R7C 1S7"
"THUNDER BAY, ON R7C1S7"
" THUNDER BAY ON R7C 1S7"
"SAULT ST. MARIE, ON K9LO9H"
"DULUTH MN 90210"
etc.
If there is any punctuation in there other than a comma, then it has to be part of one of the fields. I could spend weeks programming around their erroneous data, but I choose to document the exceptions and make them clean up the data. There has to be some consistency or it would be impossible.
Actually, I don't see how they could not store city, province, postal code separately! It boggles my mind! But they don't pay me to throw insults at their system, just to deal with it.
Jamie
[ June 06, 2003: Message edited by: Jamie Robertson ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
If there is any punctuation in there other than a comma, then it has to be part of one of the fields.
But from you examples it also looks like a comma may be intended as part of a field, though it's usually a separator. E.g.
"SAULT ST. MARIE, ON K9LO9H"
That's intended as one field, right? Seems like quite a problem if commas have been freely used in the data without some escape sequence - how do you know if it's supposed to be a field separator or not? If you're lucky and only one field is prone to have extra commas like this, then you can probably parse a record by first locating all the "normal" fields (those with no commas) and then assume that whatever is left over (any input not part of the other fields) is the remaining field - no matter how many commas. If more than one field has this problem... well... Perhaps you can find some other patterns in the data you can exploit. E.g. if one field is always a 5-digit number, that field can be a reference point for you. Good luck...
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: String --> split/regex question