• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Regex problems

 
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm trying to do a fun little project (and hopefully learn a bit as well) and I'm hitting a wall with regex.

I created and validated a rather long regular expression to validate a line in one bit call using Regexdemo

A portion of the line I want to parse should look like this
[Wed Oct 11 23:56:26 2006] John purchased, but to keep it simple, I'll only deal with the day of the week and month of the year.

I'm using the following expression
\\[(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)

I get no matches with this statement. OTOH, if I just use
\\[(Mon|Tue|Wed|Thu|Fri|Sat|Sun) it finds every line.

I've tried using \s instead of hardcoding the space (though I'd rather just have the space), and that fails.

I've tried this by scanning a file and I've tried reading a line into a string and then scanning the string, but it always fails.

Below is a code snippet. FWIW, I'm fairly certain I can break this up into seperate scans (first the bracket/day of week, then month of year and so on), and in some ways, that's probably the correct way to do it, but I want to know why this doesn't work.



As always, your input is greatly appreciated.

Kevin
 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think your problem lies in the use of the Scanner class. From my understanding, this class basically serves as a tokenizer. The default delimiter is whitespace. The call to



is checking to see if the next token matches your regular expression. The first token it checks from the input



is [Wed. This is why your first test is passing. The extended regular expression does not match this token, so it is failing.

I think what you are looking for is something like the following:



Although I'm not sure a regex test like this is the best way to go about validating a date (but then again, I'm not sure if that is what you are after).

Hope this helps.


Chris
[ November 22, 2006: Message edited by: Chris Rudd ]
 
Kevin Crays
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Chris.

I actually do want to break it down into smaller units (there's more to parse than that small bit, but that was enough to illustrate the issue), but I also wanted to grab that chunk (date + Purchase) to verify that it's a purchase line...and if not, just move on (or do something else).

That said, your suggestion is perfect, and I should have thought of it (tunnel vision gets me everytime). If it's what I want, parse the line elsewhere.

Thanks Chris.

Kevin
 
reply
    Bookmark Topic Watch Topic
  • New Topic