aspose file tools*
The moose likes Java in General and the fly likes Line of text parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Line of text parsing" Watch "Line of text parsing" New topic
Author

Line of text parsing

Erik Pragt
Ranch Hand

Joined: Sep 08, 2001
Posts: 125
Hello all,
What would be the most optimal and flexible way to parse a line of text?
For example, I have a method getValue and a String, and say I want the 3rd value in the line. Let me illustrate by the following example:
[code
Line = "Erik","Pragt","First Street","12","Amsterdam"
[/code]
Now I pass this line to my method getValue, and let say I want the streetname (the 3rd value).
My method signature looks like this:

But now my question is: how would my implementation look like? Cause I've used numerous things (StringTokenizers, custom for loops, RegEx, etc), but none of give me the idea of a 'perfect' solution, so I was wondering how you are doing it, or how you would do it if you aren't yet doing it. (Darn, I should work on my communication skills!)
Thanks for your help,
Erik Pragt
Theodore Casser
Ranch Hand

Joined: Mar 14, 2001
Posts: 1902

I think that, were I to implement it, I would go with a StringTokenizer. It seems to me that's what it was built for, and so long as you can expect your delimiter not to be included in the strings you're passing to it, you should be fine.
Though, AFAIK, any of the solutions would certainly work fine. The biggest issue(s) I would consider would be ease of updating the routine if there are changes down the road (and efficiency/speed if that's a priority).


Theodore Jonathan Casser
SCJP/SCSNI/SCBCD/SCWCD/SCDJWS/SCMAD/SCEA/MCTS/MCPD... and so many more letters than you can shake a stick at!
Erik Pragt
Ranch Hand

Joined: Sep 08, 2001
Posts: 125
Thanks Theodore for your answer.
The two points you have (speed/efficiency and ease of change) are actually the reason why I ask this question. StringTokenizers are great, but everytime I use them, I come up with a limitation in their use (though I cannot give you a good example really). Haven't you (or anyone else) got a method your always using? It seems (to me) that this is something not very specific to my programming subject, but to a whole lot of them.
Anonymous
Ranch Hand

Joined: Nov 22, 2008
Posts: 18944
If you're absolutely, positively sure that those field characters (enclosed by these double quotes) *don't* contain escaped double quotes (\") nor commas, a simple StringTokenizer can do the job using ",\"" as the delimiter set.
Otherwise, a custom made loop, using a boolean inField flag (indicating whether or not the current character appears as a character belonging to a field and some string escaping (\" stuff), iterating the string would be highly efficient.
kind regards
Erik Pragt
Ranch Hand

Joined: Sep 08, 2001
Posts: 125
I'm amazed.
I cannot believe that the 'delim' argument is a 'collection' of characters. I thought that the delim argument could only contain one separator, and that a separator could consist of multiple characters, but I seem to be very wrong.
Thanks all for your help. I don't know if the StringTokenizer is the 'perfect solution, but with the above knowledge I can come a lot further.
Thanks again,
Erik
Erik Pragt
Ranch Hand

Joined: Sep 08, 2001
Posts: 125
I'm still amazed. I could beleive the StringTokenizer, so I made this test example:

My expected output was:

But, the real output was:

Well, it proves the above posts, but still an 'unexpected' behaviour.
It does raise a side question btw: is it possible to have a double character a separater and produce the output as in code fragment 1?
Thanks,
Erik
Erik Pragt
Ranch Hand

Joined: Sep 08, 2001
Posts: 125
One last thing:
I have this String:

Now, for me it's abvious there are 7 tokens. But how do I make a program which does
a) does not detect the , in the date value, because it's separated between "'s.
b) returns the values without the "'s, without too many overhead.
Thanks for your help. The more I think of it actually, the more frustrated I get.
It might seem like a very simple thing to do, but I just can't find the best solution. ehh...HELP!

Erik
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
Ha die Erik,
I assume the syntax you want to parse is
' " ' <string> ' " ' { ' , ' ' " ' <string> ' " ' }
{Braces} denote repetition, 0 times or more. You are interested in the <string>s, which may not contain a double quote ".This code is not ideal -- it does not ignore whitespace, and throws an unchecked IllegalArgumentException when the string does not satisfy the syntax -- but it illustrates the principle. Note that the parse() method closely follows the grammar I wrote down at the top; this hopefully makes the code clearer.
When things get more complicated, you may be able to use regular expressions to good effect.
- Peter
[ January 24, 2003: Message edited by: Peter den Haan ]
David Patterson
Ranch Hand

Joined: Jul 01, 2002
Posts: 65
Originally posted by Erik Pragt:
One last thing:
I have this String:

Now, for me it's abvious there are 7 tokens. But how do I make a program which does
a) does not detect the , in the date value, because it's separated between "'s.
b) returns the values without the "'s, without too many overhead.
Thanks for your help. The more I think of it actually, the more frustrated I get.
It might seem like a very simple thing to do, but I just can't find the best solution. ehh...HELP!

Erik

Well, it is probably overkill, but a StreamTokenizer will do just that kind of split. You ask it to deal with quoted strings and it will. It even handles escaped quote symbols. Other delimiters are ignored when they fall between quotes. When it tells you the token is a quote, the contents is available in the sval field (without quotes).

Here is a bit of the code. You will need to add the code to really process what is in the string, and turn the string into a stream.

Dave Patterson
patterd1@attglobal.net
Erik Pragt
Ranch Hand

Joined: Sep 08, 2001
Posts: 125
Hello all, sorry I didn't reply earlier, I kinda 'lost' my thread....
In the weekend, I tried to create some method which would extract the data I wanted. I created the method, but sometimes it gave some strange output, and it was almost impossible to make any changes in it. So I've thrown it away (well, actually I still have, but I'm to embarrased to post it.... )
Anyway, I want to thank you all (especially Peter (dus Peter, bedankt!!)) for your help. But I'm sure it will take me a little time to perfectly understand what it does....
So everybody, thanks again for your help. I'm sure it has helped me understand things better!
Greetings, Erik
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Line of text parsing