• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Dot metacharacter Question

 
sharma ishu
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
class C6{
public static void main(String[] a){
String s="abc de.f1 adf34 cat.dog";
System.out.println(s+"\n");
String[] t=s.split(a[0]);
for(String x:t)
System.out.println("<"+x+">");
//System.out.println(t[0]);
}
}
/*

C:\code\e5> javac C6.java


1. C:\code\e5>java C6 .
abc de.f1 adf34 cat.dog

2 C:\code\e5>java C6 \.
abc de.f1 adf34 cat.dog

<abc de>
<f1 adf34 cat>
<dog>


3. C:\code\e5>java C6 \\.
abc de.f1 adf34 cat.dog

<abc de.f1 adf34 cat.dog>


*/
Kindly explain why these three invocations behave this way. especially the 1st one.
 
Himai Minh
Ranch Hand
Posts: 1295
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In command prompt, the symbols:
1 . (a dot) means meta character
2. \. (a slash a dot) means a dot
3. \\. (a double slash and a dot) means a slash followed by a dot.

I think in command prompt, a slash is not an escape character. A slash is slash. But in a program, a slash means escape character.

Case 1, the string is splitted by a meta character,which is any character. If you are given "abcde." , the string is splitted by any character. When we split "ab", the result is a "" between "a" and "b".

Case 2, the string is splitted by a dot.

Case 3, the string is splitted by \\. , which does not exit in the given string. If you input ab\.c, I am sure the output is <ab> and <c>.

Let me know if I make mistake.
 
sharma ishu
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Himai Minh wrote:In command prompt, the symbols:
1 . (a dot) means meta character
2. \. (a slash a dot) means a dot
3. \\. (a double slash and a dot) means a slash followed by a dot.

I think in command prompt, a slash is not an escape character. A slash is slash. But in a program, a slash means escape character.

Case 1, the string is splitted by a meta character,which is any character. If you are given "abcde." , the string is splitted by any character. When we split "ab", the result is a "" between "a" and "b".

Case 2, the string is splitted by a dot.

Case 3, the string is splitted by \\. , which does not exit in the given string. If you input ab\.c, I am sure the output is <ab> and <c>.

Let me know if I make mistake.

I think 2nd and 3rd are fine. But in 1st case why doesn't it print empty strings between><?
 
Henry Wong
author
Marshal
Pie
Posts: 21192
81
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Himai Minh wrote:3. \\. (a double slash and a dot) means a slash followed by a dot.



Almost. In the third case, the dot isn't escaped -- only the backslash is. So, it is trying to split on a backslash that is followed by any character.

Henry
 
Henry Wong
author
Marshal
Pie
Posts: 21192
81
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ishusharma sharma wrote:I think 2nd and 3rd are fine. But in 1st case why doesn't it print empty strings between><?



This is an implementation detail ... to understand why, take a look at the javadoc for the java.util.regex.Pattern class, specifically for the split() method.

split
public String[] split(CharSequence input)

Splits the given input sequence around matches of this pattern.

This method works as if by invoking the two-argument split method with the given input sequence and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.


And since splitting on a regex dot (ie. any character) should yield nothing but zero length strings, the split() method call should return an empty array as the result.

Henry
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic