• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

String --> split/regex question

 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
problem: split the String by any whitespace OR comma but the resulting String array does not any empty String values
Currently I have

which produces this close, but undesireable result:
1|Hi|
2|there|
3|| //don't want
4|my|
5|name|
6|is|
7|Robertson|
8|| //don't want
9|| //don't want
10|| //don't want
11|Jamie|
The result I'm looking for:
1|Hi|
2|there|
3|my|
4|name|
5|is|
6|Robertson|
7|Jamie|
Note** I'd like to solve this using the regex pattern to return the desired result, not a bolt on method that strips the array of any empty Strings.
Thanks, Jamie
[ June 05, 2003: Message edited by: Jamie Robertson ]
 
John Lee
Ranch Hand
Posts: 2545
 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don,
I've read the tutorials and gave it my best shot ( as my post has shown ) and have the right pattern to get the desired result. I'm just looking to see if it can be tweaked to eliminate the empty Strings returned by the split method. I was thinking maybe by matching not just a comma or a whitespace, but any number of whitespaces or a comma followed by zero or any number of whitespaces...hmmm that gives me an idea...
thanx anyways
Jamie
 
Eric Pascarello
author
Rancher
Posts: 15385
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You should know, I know nothing about Java
s = s.replaceAll("\s\s\s", "\s");
s = s.replaceAll("\s\s", "\s");

see if that works, I have no clue (well I have a slight clue). If it were in JavaScript, I could have it answered in a second. Eric
 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks Eric, it's that would work if I can't find the pattern I'm looking for.
I thought that something like
String[] split = s.split("[\\s+|,\\s*]");
would do, but oviously I'm not using it properly.
Jamie
 
John Lee
Ranch Hand
Posts: 2545
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The string "boo:and:foo", for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" }
 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't see the connection Don. Show me how this satisfies my desired output, because I don't think it does.
*for the record, I have read the javadocs for the String and Pattern classes, and have looked at the tutorial so you don't have to post anymore references to tutorials or the docs, cause I've read them!
Jamie
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I thought that something like
String[] split = s.split("[\\s+|,\\s*]");
would do, but oviously I'm not using it properly.

You're trying to use quantifiers (* and +) inside a character class (the braces []) - that doesn't work. Neither does the alternation |. Inside braces, symbols mean completely different things - most commonly, their literal char value. Except for ^ and -. Instead try something like
String[] split = s.split("[\\s,]+");
The character class [\\s,] is any whitespace or char character, and the + is outside the char class, so now it should behave like you expect. Alternately perhaps you want
String[] split = s.split("[\\s,]\\s*");
which will grabe as many trailing spaces as you want, but only one comma. (Based on some of your previous attempts.) This would create an empty string for something like
Alfa,,Charlie
but not for

It's hard to be sure what you want without seeing the input as well as the desired output.
[ June 05, 2003: Message edited by: Jim Yingst ]
 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's hard to be sure what you want without seeing the input as well as the desired output.
[ June 05, 2003: Message edited by: Jim Yingst ][/QB]

Can't believe I forgot it! It was something like

and I want the result to only have an array of non-empty Strings ( words only )
Jamie
 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jim, thanks
worked exactly as I wanted. I'll have a re-read of your post and read some more on this particular regex that I needed. I hate not understanding something that I use in a program.
Thanx again,
Jamie
by the way, is there anything you don't know Jim??
 
John Lee
Ranch Hand
Posts: 2545
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi:
i would use split twice: the first time, create a array, then transfer it back to string, now you get rid of either "," or "<space>". then split it again to create desired result.
 
Frank Carver
Sheriff
Posts: 6920
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
and I want the result to only have an array of non-empty Strings ( words only )
Which begs the question, what about other punctuation and non-alpabetics? Can you clarify what constitutes a "word"?
If I give you the realistic (albeit colloquial)
"Hi! My Name? Frank; of course. (here's my resum´┐Ż) "
What words would you expect to see in the resulting array?
Is there any way of "escaping" a space or comma so that it will appear in the outout?
 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Frank, this is actually parsing an address line so other punctuation should be included in the words.
Sample Lines:
"THUNDER BAY,ON R7C 1S7"
"THUNDER BAY, ON R7C1S7"
" THUNDER BAY ON R7C 1S7"
"SAULT ST. MARIE, ON K9LO9H"
"DULUTH MN 90210"
etc.
If there is any punctuation in there other than a comma, then it has to be part of one of the fields. I could spend weeks programming around their erroneous data, but I choose to document the exceptions and make them clean up the data. There has to be some consistency or it would be impossible.
Actually, I don't see how they could not store city, province, postal code separately! It boggles my mind! But they don't pay me to throw insults at their system, just to deal with it.
Jamie
[ June 06, 2003: Message edited by: Jamie Robertson ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If there is any punctuation in there other than a comma, then it has to be part of one of the fields.
But from you examples it also looks like a comma may be intended as part of a field, though it's usually a separator. E.g.
"SAULT ST. MARIE, ON K9LO9H"
That's intended as one field, right? Seems like quite a problem if commas have been freely used in the data without some escape sequence - how do you know if it's supposed to be a field separator or not? If you're lucky and only one field is prone to have extra commas like this, then you can probably parse a record by first locating all the "normal" fields (those with no commas) and then assume that whatever is left over (any input not part of the other fields) is the remaining field - no matter how many commas. If more than one field has this problem... well... Perhaps you can find some other patterns in the data you can exploit. E.g. if one field is always a 5-digit number, that field can be a reference point for you. Good luck...
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic