aspose file tools*
The moose likes Beginning Java and the fly likes StringTokenizer problem with em dash and en dash Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "StringTokenizer problem with em dash and en dash" Watch "StringTokenizer problem with em dash and en dash" New topic
Author

StringTokenizer problem with em dash and en dash

Louis Meigret
Greenhorn

Joined: Mar 24, 2003
Posts: 3
Why does this Tokenizer not split the following line at the em dash and en dash (\u2013 and \u2014) ? I'm using Java JDK 1.4.1.

l = new StreamTokenizer(r);
l.resetSyntax();
l.wordChars(0, '\u2012');
l.wordChars('\u2015', '\uffff');
l.whitespaceChars(' ', ' ');
l.whitespaceChars('\t', '\t');
l.whitespaceChars('\n', '\n');
l.ordinaryChar('[');
l.ordinaryChar(']');
l.ordinaryChar('(');
l.ordinaryChar(')');
l.ordinaryChars('\u2013','\u2014');
l.eolIsSignificant(true);

Thank you for any help.
Tom Purl
Ranch Hand

Joined: May 24, 2002
Posts: 104
What does your string look like? What are you trying to split?


Tom Purl<br />SCJP 1.4
Louis Meigret
Greenhorn

Joined: Mar 24, 2003
Posts: 3
Originally posted by Tom Purl:
What does your string look like? What are you trying to split?

"First\u2014Second [Third]"
Should produce
First
\u2014
Second
[
Third
]
(I believe this function is not properly internationalized, most probably uses a internal table (1 cell per character < 0xFF))
Gabriel White
Ranch Hand

Joined: Mar 02, 2003
Posts: 233
doesn't your compiler see \u as an illegal escape character?
Because I get it to produce dashes instead of the acutal string.
Louis Meigret
Greenhorn

Joined: Mar 24, 2003
Posts: 3
Originally posted by Steve Wysocki:
doesn't your compiler see \u as an illegal escape character?
Because I get it to produce dashes instead of the acutal string.

JBuilder does not report any error.
Acutal ? What output did you get ?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: StringTokenizer problem with em dash and en dash