This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Beginning Java and the fly likes StringTokenizer problem with em dash and en dash Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "StringTokenizer problem with em dash and en dash" Watch "StringTokenizer problem with em dash and en dash" New topic
Author

StringTokenizer problem with em dash and en dash

Louis Meigret
Greenhorn

Joined: Mar 24, 2003
Posts: 3
Why does this Tokenizer not split the following line at the em dash and en dash (\u2013 and \u2014) ? I'm using Java JDK 1.4.1.

l = new StreamTokenizer(r);
l.resetSyntax();
l.wordChars(0, '\u2012');
l.wordChars('\u2015', '\uffff');
l.whitespaceChars(' ', ' ');
l.whitespaceChars('\t', '\t');
l.whitespaceChars('\n', '\n');
l.ordinaryChar('[');
l.ordinaryChar(']');
l.ordinaryChar('(');
l.ordinaryChar(')');
l.ordinaryChars('\u2013','\u2014');
l.eolIsSignificant(true);

Thank you for any help.
Tom Purl
Ranch Hand

Joined: May 24, 2002
Posts: 104
What does your string look like? What are you trying to split?


Tom Purl<br />SCJP 1.4
Louis Meigret
Greenhorn

Joined: Mar 24, 2003
Posts: 3
Originally posted by Tom Purl:
What does your string look like? What are you trying to split?

"First\u2014Second [Third]"
Should produce
First
\u2014
Second
[
Third
]
(I believe this function is not properly internationalized, most probably uses a internal table (1 cell per character < 0xFF))
Gabriel White
Ranch Hand

Joined: Mar 02, 2003
Posts: 233
doesn't your compiler see \u as an illegal escape character?
Because I get it to produce dashes instead of the acutal string.
Louis Meigret
Greenhorn

Joined: Mar 24, 2003
Posts: 3
Originally posted by Steve Wysocki:
doesn't your compiler see \u as an illegal escape character?
Because I get it to produce dashes instead of the acutal string.

JBuilder does not report any error.
Acutal ? What output did you get ?
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: StringTokenizer problem with em dash and en dash
 
Similar Threads
Need help in setting customised jvm parameters
Scrabble... it only takes a minute...
An invalid XML character (Unicode: 0x13) was found in the element content of the document.
Problem writing Emdash to txt file on Solaris OS 5.9
How to exclude directories and files from scm:checkout