• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Tokens

 
Kevin Luludis
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What would be the best way to read in any text based file. And then be able to pass through the file as an array, and splitting up the text into tokens.

I want to be able to take whatever the 'Token' is and then check it to see exactly what the token is specifically. Is it an Id, expression, term, factor, etc.
Thanks a lot
 
Stephanie Grasson
Ranch Hand
Posts: 347
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Kevin,
Could you use something like this:

Stephanie
 
Kevin Luludis
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
WOW! Stephanie that was great! Thank you very much. I just wanted to tell you that before I went to lunch. I have some more questions maybe you could help me with on this matter. Thanks again!
 
Kevin Luludis
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I see how this is done by declaring what I want output as the String s. But how would I do this using File I/O?
 
Stephanie Grasson
Ranch Hand
Posts: 347
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Kevin,
You could try:

This assumes that you have a file in the current directory called fruit.txt that looks like this:

Apples := Bannana + 4371 - Pears DIV Apples
Bananna := Pears + 1734 - Apples DIV Bananna
Pears := Apples + 3471 - Bananna DIV Pears

Hope this helps.
Stephanie
 
Kevin Luludis
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks again. You have been a tremendous help. Does StringTokenizer automatically ignore whitespace? It recognized EOL fine. But trouble with the whitespace. Im just asking because I know there is StremTokenizer, but it looks like StringTokenizer handles this much better!
 
Stephanie Grasson
Ranch Hand
Posts: 347
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Kevin,
I believe how whitespace is handled depends on how you set up your StringTokenizer.
I used the constructor
StringTokenizer( String s );
which works as follows:

Constructs a string tokenizer for the specified string. The tokenizer uses the default delimiter set, which is "\t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character. Delimiter characters themselves will not be treated as tokens.

If this is not what you had in mind, you could check out one of the other StringTokenizer constructors.
Sorry, I've never used StremTokenizer, so I don't know about it.
Stephanie
 
Kevin Luludis
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is exactly what I had in mind, but If i was to read in some data that accidentally was missing a space the Tokens would be then joined together. Not exactly what I wanted but supper close!! What would I have to change in order for it to not only handle tabs, new lines, and returns, but also spaces?
 
Angela Lamb
Ranch Hand
Posts: 156
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The StreamTokenizer is very similar to the StringTokenizer, except that it takes either an InputStream or a Reader in the constructor instead of a String. It also has some very nice methods for determining whether the token is a word or a line (so it will separate by both spaces and line breaks). It even tells you what the line number for each token is. Here's a link to the docs:

http://java.sun.com/j2se/1.3/docs/api/java/io/StreamTokenizer.html
 
Kevin Luludis
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had tried that, not giving me any real good results. Any suggestions for optimizing this?

So far all i got it to do was tell me whether the token was a word or a number, and then give me its corresponding value. It did ignore the whitespace which was good, but I would love to get the results that the StringTokenizer gave above.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic