• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Line of text parsing

 
Erik Pragt
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all,
What would be the most optimal and flexible way to parse a line of text?
For example, I have a method getValue and a String, and say I want the 3rd value in the line. Let me illustrate by the following example:
[code
Line = "Erik","Pragt","First Street","12","Amsterdam"
[/code]
Now I pass this line to my method getValue, and let say I want the streetname (the 3rd value).
My method signature looks like this:

But now my question is: how would my implementation look like? Cause I've used numerous things (StringTokenizers, custom for loops, RegEx, etc), but none of give me the idea of a 'perfect' solution, so I was wondering how you are doing it, or how you would do it if you aren't yet doing it. (Darn, I should work on my communication skills!)
Thanks for your help,
Erik Pragt
 
Theodore Casser
Ranch Hand
Posts: 1902
Hibernate Netbeans IDE PHP
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think that, were I to implement it, I would go with a StringTokenizer. It seems to me that's what it was built for, and so long as you can expect your delimiter not to be included in the strings you're passing to it, you should be fine.
Though, AFAIK, any of the solutions would certainly work fine. The biggest issue(s) I would consider would be ease of updating the routine if there are changes down the road (and efficiency/speed if that's a priority).
 
Erik Pragt
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Theodore for your answer.
The two points you have (speed/efficiency and ease of change) are actually the reason why I ask this question. StringTokenizers are great, but everytime I use them, I come up with a limitation in their use (though I cannot give you a good example really). Haven't you (or anyone else) got a method your always using? It seems (to me) that this is something not very specific to my programming subject, but to a whole lot of them.
 
Anonymous
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you're absolutely, positively sure that those field characters (enclosed by these double quotes) *don't* contain escaped double quotes (\") nor commas, a simple StringTokenizer can do the job using ",\"" as the delimiter set.
Otherwise, a custom made loop, using a boolean inField flag (indicating whether or not the current character appears as a character belonging to a field and some string escaping (\" stuff), iterating the string would be highly efficient.
kind regards
 
Erik Pragt
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm amazed.
I cannot believe that the 'delim' argument is a 'collection' of characters. I thought that the delim argument could only contain one separator, and that a separator could consist of multiple characters, but I seem to be very wrong.
Thanks all for your help. I don't know if the StringTokenizer is the 'perfect solution, but with the above knowledge I can come a lot further.
Thanks again,
Erik
 
Erik Pragt
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm still amazed. I could beleive the StringTokenizer, so I made this test example:

My expected output was:

But, the real output was:

Well, it proves the above posts, but still an 'unexpected' behaviour.
It does raise a side question btw: is it possible to have a double character a separater and produce the output as in code fragment 1?
Thanks,
Erik
 
Erik Pragt
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One last thing:
I have this String:

Now, for me it's abvious there are 7 tokens. But how do I make a program which does
a) does not detect the , in the date value, because it's separated between "'s.
b) returns the values without the "'s, without too many overhead.
Thanks for your help. The more I think of it actually, the more frustrated I get.
It might seem like a very simple thing to do, but I just can't find the best solution. ehh...HELP!

Erik
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ha die Erik,
I assume the syntax you want to parse is
' " ' <string> ' " ' { ' , ' ' " ' <string> ' " ' }
{Braces} denote repetition, 0 times or more. You are interested in the <string>s, which may not contain a double quote ".This code is not ideal -- it does not ignore whitespace, and throws an unchecked IllegalArgumentException when the string does not satisfy the syntax -- but it illustrates the principle. Note that the parse() method closely follows the grammar I wrote down at the top; this hopefully makes the code clearer.
When things get more complicated, you may be able to use regular expressions to good effect.
- Peter
[ January 24, 2003: Message edited by: Peter den Haan ]
 
David Patterson
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Erik Pragt:
One last thing:
I have this String:

Now, for me it's abvious there are 7 tokens. But how do I make a program which does
a) does not detect the , in the date value, because it's separated between "'s.
b) returns the values without the "'s, without too many overhead.
Thanks for your help. The more I think of it actually, the more frustrated I get.
It might seem like a very simple thing to do, but I just can't find the best solution. ehh...HELP!

Erik

Well, it is probably overkill, but a StreamTokenizer will do just that kind of split. You ask it to deal with quoted strings and it will. It even handles escaped quote symbols. Other delimiters are ignored when they fall between quotes. When it tells you the token is a quote, the contents is available in the sval field (without quotes).

Here is a bit of the code. You will need to add the code to really process what is in the string, and turn the string into a stream.

Dave Patterson
patterd1@attglobal.net
 
Erik Pragt
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all, sorry I didn't reply earlier, I kinda 'lost' my thread....
In the weekend, I tried to create some method which would extract the data I wanted. I created the method, but sometimes it gave some strange output, and it was almost impossible to make any changes in it. So I've thrown it away (well, actually I still have, but I'm to embarrased to post it.... )
Anyway, I want to thank you all (especially Peter (dus Peter, bedankt!!)) for your help. But I'm sure it will take me a little time to perfectly understand what it does....
So everybody, thanks again for your help. I'm sure it has helped me understand things better!
Greetings, Erik
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic