• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

parsing quoted text

 
ronald ali mangaliag
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i wanted to parse a text that is similar to the one below:

firstname lastname age birthday

the first name may be enclosed in quotes as in "ronald ali" and that is true also for the lastname... the age should be numeric and birthday should be a valid bday with this format yyyy-MM-dd...

i used Stringtokenizer... but the problem is, the dashes (-) in birthday is treated as a character...

saving each token on a List makes the size of the same to 8 instead of only 4 (pertaining to the four fields)....

what do you think should i do? do you have a solution? even if it doesnt use stringtokenizer... i was thinking of using string.split() but am not well versed in regular expression... please advice....

thank you

ali
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, StringTokenizer will work fine to tokenize that String into the four elements you want. You might have to show us your code and perhaps we can see what you are doing wrong.

There are other ways to do this though. You could (as you have noticed) use the split() method of the String class. Strictly you would need to use the "any whitespace" symbol in your regex ([\s]), but since your regex is so simple, just using a space in the split method will work. If you are still unsure of regex's, you could treat the String as a char [] and process each character at a time.
 
Scheepers de Bruin
Ranch Hand
Posts: 99
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok suppose you get a string that contains the following:
String input = "\"firstname\" lastname age birthday";
(the \" is how you 'escape' the double quote character, i.e. how you tell java to stick a double quote character in a string without interpreting it as a String delimiter)

You can use the StringTokenizer like this:
StringTokenizer st = new StringTokenizer(input, " ");
(Using space as the delimiter)

Or you can use the split method:
String[] params= input.split(" ");
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Did any of those tips help with the embedded blank in \"Ronald Ali\" ?

I drag a lot of bad habits from my pre-Java days, but I'd probably get one token at a time from the string the hard way with a "cursor" or position in the string:

You can smarten this up with regular expressions ... maybe one that will match the first quoted string (allowing blanks inside) OR the next unquoted string up to a blank or end of input.
[ August 31, 2005: Message edited by: Stan James ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic