File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes split method Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "split method" Watch "split method" New topic
Author

split method

Shiang Wang
Ranch Hand

Joined: Jun 20, 2003
Posts: 96
How do I use split method in String class to split "ABC..DE" into two strings: "ABC", and "DE" ? I tried ".." as the regular expression construct, it confuses with "." which means any character.
Another similar challenge is to parse "AB[CD]E" into "AB","CD","E". Again I can't use "[]" as it represents Character class in regular expression.
Please help, I don't want to use StringTokenizer or index and substring in String to get the result.
Thanks


SCBCD, SCWCD, SCJP
Gayathri Prasad
Ranch Hand

Joined: Jun 25, 2003
Posts: 116
Hi,
emm I tried out some thing like
String strTest = "test1..test2..test3...De";
String[] ar_strTest = strTest.split( "\\.\\." );
Here whenever it finds a regular expression of .. it would tokenzie the strings. Hope this helps..
I am on to the second one n would get back to u soon.
Cheers,
Gaya3
------------------------------------------------
Beginning is half done..
Shiang Wang
Ranch Hand

Joined: Jun 20, 2003
Posts: 96
Thanks for your help, that works for me.
Shiang Wang
Ranch Hand

Joined: Jun 20, 2003
Posts: 96
Thanks for your help, that works for me.
Leslie Chaim
Ranch Hand

Joined: May 22, 2002
Posts: 336
More explanations HTH,
The '.' is a regex meta-character which says match any character. Now, if you wanted the literal sense of '.' (e.g. match a decimal point) you have two options.
The first is to simply use the great escape of the '\' backslash meta-character as in '\.', and since the regex is passed as a string you need to escape the backslash as "\\." this yields the '\.' to the regex engine.
The second option is using octal escapes. The '.' character is 56 octal so you can pass "\\056" to match a single '.' character of the target string. You might not want to be so fancy with octal escapes (since you need the backslash anyway) but there are cases where you must use them.
One more note in trying to split ABC..DE using something like:
values = line.split ("\\.\\.");
This will split on exactly 2 '.' chars. What if there was only one '.' or how about if there's more, and what if ...
Regex handles this nicely with quantifiers.
If you say:
\.+
The '+' (another meta-char) says to match the previous atomic unit one or more times.
\.+
The '*' (another meta-char) says to match the previous atomic unit zero or more times.
What do you take out from this so far, well first you will understand that if your data was like ABC++DE that split ("++") would have failed and probably would have thrown a tantrum flavored by PatternSyntaxException. Another thing is that you have more flexibility when using these quantifiers.
But I don�t like the "zero|one or more thing", I wanna match exactly two!
\.{2}
You got it!
How about a range
\.{2,5}
We got this too which says match the previous atomic unit a minimum of 2 upto 5 times.
Can I omit portions of the range of n,m?
Sure!
\.{,6} Match zero upto max of 6
\.{6,} Match minimum 6 upto infinity
What do you mean by the previous atomic unit?
Gee, you're asking some good questions today
An atomic unit is a particular piece of a regex that cannot be broken apart for example:
  • \.
  • \056
  • [A-Za-z0-9]
  • ( sub regex )
  • [:alpha:]
  • A
  • B
  • \+


  • Although they make up a number of characters, they are all treated as a single unit from the regex's engine viewpoint.
    For example, if your try to match 'ab{4,7}' against 'ababababababab' it will fail. The atomic unit that is governed by the quantifier (in this case the '{4,7}') is the 'b' character! If you wanted ab treated as a whole use parenthesis to group them as in '(ab){4,7}' now the quantifier governs '(sub regex)' which is a single atomic unit.
    So you have learned about quantifiers and the great escape. Just let me finish with this: What if you wanted to split a string such as ABC\\DE?
    Think! And post your solution
    Cheers,
    Leslie


    Normal is in the eye of the beholder
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: split method
     
    Similar Threads
    Regular expression
    equals() method and == operator
    List to String
    Pathname Validation - Please help out!!!
    split() method in String API