Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

split method

 
Shiang Wang
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do I use split method in String class to split "ABC..DE" into two strings: "ABC", and "DE" ? I tried ".." as the regular expression construct, it confuses with "." which means any character.
Another similar challenge is to parse "AB[CD]E" into "AB","CD","E". Again I can't use "[]" as it represents Character class in regular expression.
Please help, I don't want to use StringTokenizer or index and substring in String to get the result.
Thanks
 
Gayathri Prasad
Ranch Hand
Posts: 116
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
emm I tried out some thing like
String strTest = "test1..test2..test3...De";
String[] ar_strTest = strTest.split( "\\.\\." );
Here whenever it finds a regular expression of .. it would tokenzie the strings. Hope this helps..
I am on to the second one n would get back to u soon.
Cheers,
Gaya3
------------------------------------------------
Beginning is half done..
 
Shiang Wang
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your help, that works for me.
 
Shiang Wang
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your help, that works for me.
 
Leslie Chaim
Ranch Hand
Posts: 336
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
More explanations HTH,
The '.' is a regex meta-character which says match any character. Now, if you wanted the literal sense of '.' (e.g. match a decimal point) you have two options.
The first is to simply use the great escape of the '\' backslash meta-character as in '\.', and since the regex is passed as a string you need to escape the backslash as "\\." this yields the '\.' to the regex engine.
The second option is using octal escapes. The '.' character is 56 octal so you can pass "\\056" to match a single '.' character of the target string. You might not want to be so fancy with octal escapes (since you need the backslash anyway) but there are cases where you must use them.
One more note in trying to split ABC..DE using something like:
values = line.split ("\\.\\.");
This will split on exactly 2 '.' chars. What if there was only one '.' or how about if there's more, and what if ...
Regex handles this nicely with quantifiers.
If you say:
\.+
The '+' (another meta-char) says to match the previous atomic unit one or more times.
\.+
The '*' (another meta-char) says to match the previous atomic unit zero or more times.
What do you take out from this so far, well first you will understand that if your data was like ABC++DE that split ("++") would have failed and probably would have thrown a tantrum flavored by PatternSyntaxException. Another thing is that you have more flexibility when using these quantifiers.
But I don�t like the "zero|one or more thing", I wanna match exactly two!
\.{2}
You got it!
How about a range
\.{2,5}
We got this too which says match the previous atomic unit a minimum of 2 upto 5 times.
Can I omit portions of the range of n,m?
Sure!
\.{,6} Match zero upto max of 6
\.{6,} Match minimum 6 upto infinity
What do you mean by the previous atomic unit?
Gee, you're asking some good questions today
An atomic unit is a particular piece of a regex that cannot be broken apart for example:
  • \.
  • \056
  • [A-Za-z0-9]
  • ( sub regex )
  • [:alpha:]
  • A
  • B
  • \+


  • Although they make up a number of characters, they are all treated as a single unit from the regex's engine viewpoint.
    For example, if your try to match 'ab{4,7}' against 'ababababababab' it will fail. The atomic unit that is governed by the quantifier (in this case the '{4,7}') is the 'b' character! If you wanted ab treated as a whole use parenthesis to group them as in '(ab){4,7}' now the quantifier governs '(sub regex)' which is a single atomic unit.
    So you have learned about quantifiers and the great escape. Just let me finish with this: What if you wanted to split a string such as ABC\\DE?
    Think! And post your solution
    Cheers,
    Leslie
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic