• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
Bartenders:
  • Piet Souris
  • Himai Minh

About tokenizer.

 
Ranch Hand
Posts: 317
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dear all, please take a look at the following code:

The output is:

Why incomingString1 and incomingString2 return different tokencount, in my opinion, they should return same number: 3.
Please help to clarify, thanks in advance.
Sun Guoqiao
 
Ranch Hand
Posts: 2596
Android Firefox Browser Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Indeed!!!
I tried it with few other combination of delimeters as well, but it does get 4 tokens in the second string. It separtes "te" and "st" in the second list. Even I would love to know why it is so? The API does not mention anything abt such behaviour for any particular delimeters. Buggy implementation may be???
- Manish
[This message has been edited by Manish Hatwalne (edited October 30, 2001).]
 
author
Posts: 3252
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Don't worry, this is exactly the expected and documented behaviour Where you're going wrong is the assumption that the <code>delim</code> argument is a delimiter string. In fact, it is a set of delimiter characters (as the javadoc puts it, the characters in the delim argument are the delimiters for separating tokens.).
Your delimiter characters are '[', '=', '|' and ']', and you are not returning the characters as tokens. The first test therefore string has 3 tokens (thrice word " Test "), separated by sequences of 5 delimiters each. The second test string splits the middle " Test " token in two using the '=' delimiter, giving 4 tokens in total.
StringTokenizer doesn't handle multi-character tokens. You can do it but you'll have to assist it a bit - for instance, by making "[" your delimiter set and testing if the returned Strings start with "=|=]". StreamTokenizer doesn't seem very suitable either.
- Peter

[This message has been edited by Peter den Haan (edited October 30, 2001).]
 
Guoqiao Sun
Ranch Hand
Posts: 317
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you very much, Peter! I am now clear with it. But it seems very troublesome if we write our own class for dealing with multi-character tokens String.
Regards,

Originally posted by Peter den Haan:
Don't worry, this is exactly the expected and documented behaviour Where you're going wrong is the assumption that the <code>delim</code> argument is a delimiter string. In fact, it is a set of delimiter characters (as the javadoc puts it, the characters in the delim argument are the delimiters for separating tokens.).
Your delimiter characters are '[', '=', '|' and ']', and you are not returning the characters as tokens. The first test therefore string has 3 tokens (thrice word " Test "), separated by 5-token sequences. The second test string splits the middle " Test " token in two using the '=' delimiter, giving 4 tokens in total.
StringTokenizer doesn't handle multi-character tokens. You can do it but you'll have to assist it a bit - for instance, by making "[" your delimiter set and testing if the returned Strings start with "=|=]". StreamTokenizer doesn't seem very suitable either.
- Peter


 
Manish Hatwalne
Ranch Hand
Posts: 2596
Android Firefox Browser Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Guoqiao Sun:
Thank you very much, Peter! I am now clear with it. But it seems very troublesome if we write our own class for dealing with multi-character tokens String.


Yep! I agree. Thanks Peter.
- Manish
 
All that thinking. Doesn't it hurt? What do you think about this tiny ad?
Free, earth friendly heat - from the CodeRanch trailboss
https://www.kickstarter.com/projects/paulwheaton/free-heat
reply
    Bookmark Topic Watch Topic
  • New Topic