problem: split the String by any whitespace OR comma but the resulting String array does not any empty String values Currently I have
which produces this close, but undesireable result: 1|Hi| 2|there| 3|| //don't want 4|my| 5|name| 6|is| 7|Robertson| 8|| //don't want 9|| //don't want 10|| //don't want 11|Jamie| The result I'm looking for: 1|Hi| 2|there| 3|my| 4|name| 5|is| 6|Robertson| 7|Jamie| Note** I'd like to solve this using the regex pattern to return the desired result, not a bolt on method that strips the array of any empty Strings. Thanks, Jamie [ June 05, 2003: Message edited by: Jamie Robertson ]
Don, I've read the tutorials and gave it my best shot ( as my post has shown ) and have the right pattern to get the desired result. I'm just looking to see if it can be tweaked to eliminate the empty Strings returned by the split method. I was thinking maybe by matching not just a comma or a whitespace, but any number of whitespaces or a comma followed by zero or any number of whitespaces...hmmm that gives me an idea... thanx anyways Jamie
thanks Eric, it's that would work if I can't find the pattern I'm looking for. I thought that something like String[] split = s.split("[\\s+|,\\s*]"); would do, but oviously I'm not using it properly. Jamie
John Lee
Ranch Hand
Joined: Aug 05, 2001
Posts: 2545
posted
0
The string "boo:and:foo", for example, yields the following results with these parameters: Regex Limit Result : 2 { "boo", "and:foo" } : 5 { "boo", "and", "foo" } : -2 { "boo", "and", "foo" } o 5 { "b", "", ":and:f", "", "" } o -2 { "b", "", ":and:f", "", "" } o 0 { "b", "", ":and:f" }
I don't see the connection Don. Show me how this satisfies my desired output, because I don't think it does. *for the record, I have read the javadocs for the String and Pattern classes, and have looked at the tutorial so you don't have to post anymore references to tutorials or the docs, cause I've read them! Jamie
I thought that something like String[] split = s.split("[\\s+|,\\s*]"); would do, but oviously I'm not using it properly. You're trying to use quantifiers (* and +) inside a character class (the braces []) - that doesn't work. Neither does the alternation |. Inside braces, symbols mean completely different things - most commonly, their literal char value. Except for ^ and -. Instead try something like String[] split = s.split("[\\s,]+"); The character class [\\s,] is any whitespace or char character, and the + is outside the char class, so now it should behave like you expect. Alternately perhaps you want String[] split = s.split("[\\s,]\\s*"); which will grabe as many trailing spaces as you want, but only one comma. (Based on some of your previous attempts.) This would create an empty string for something like Alfa,,Charlie but not for
It's hard to be sure what you want without seeing the input as well as the desired output. [ June 05, 2003: Message edited by: Jim Yingst ]
Jim, thanks worked exactly as I wanted. I'll have a re-read of your post and read some more on this particular regex that I needed. I hate not understanding something that I use in a program. Thanx again, Jamie by the way, is there anything you don't know Jim??
John Lee
Ranch Hand
Joined: Aug 05, 2001
Posts: 2545
posted
0
hi: i would use split twice: the first time, create a array, then transfer it back to string, now you get rid of either "," or "<space>". then split it again to create desired result.
and I want the result to only have an array of non-empty Strings ( words only ) Which begs the question, what about other punctuation and non-alpabetics? Can you clarify what constitutes a "word"? If I give you the realistic (albeit colloquial) "Hi! My Name? Frank; of course. (here's my resum�) " What words would you expect to see in the resulting array? Is there any way of "escaping" a space or comma so that it will appear in the outout?
Frank, this is actually parsing an address line so other punctuation should be included in the words. Sample Lines: "THUNDER BAY,ON R7C 1S7" "THUNDER BAY, ON R7C1S7" " THUNDER BAY ON R7C 1S7" "SAULT ST. MARIE, ON K9LO9H" "DULUTH MN 90210" etc. If there is any punctuation in there other than a comma, then it has to be part of one of the fields. I could spend weeks programming around their erroneous data, but I choose to document the exceptions and make them clean up the data. There has to be some consistency or it would be impossible. Actually, I don't see how they could not store city, province, postal code separately! It boggles my mind! But they don't pay me to throw insults at their system, just to deal with it. Jamie [ June 06, 2003: Message edited by: Jamie Robertson ]
Jim Yingst
Wanderer
Sheriff
Joined: Jan 30, 2000
Posts: 18670
posted
0
If there is any punctuation in there other than a comma, then it has to be part of one of the fields. But from you examples it also looks like a comma may be intended as part of a field, though it's usually a separator. E.g. "SAULT ST. MARIE, ON K9LO9H" That's intended as one field, right? Seems like quite a problem if commas have been freely used in the data without some escape sequence - how do you know if it's supposed to be a field separator or not? If you're lucky and only one field is prone to have extra commas like this, then you can probably parse a record by first locating all the "normal" fields (those with no commas) and then assume that whatever is left over (any input not part of the other fields) is the remaining field - no matter how many commas. If more than one field has this problem... well... Perhaps you can find some other patterns in the data you can exploit. E.g. if one field is always a 5-digit number, that field can be a reference point for you. Good luck...