File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes PHP and the fly likes Problems with regular expressions Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Languages » PHP
Bookmark "Problems with regular expressions" Watch "Problems with regular expressions" New topic
Author

Problems with regular expressions

Chris Creed
Ranch Hand

Joined: Feb 27, 2009
Posts: 69

Hello all

I'm working on a project where I'm reading a flat file and using various regex patterns to determine if content is in said file, and then to process accordingly. For the most part this is working beautifully. However, one area of the file no matter what I try pattern wise will not find a match. Here is the data that is being analyzed.

S_Eriksson GK | GK J_Caola
P_Dinning DF | DF L_Tunstall
A_Mohlin DF | DF N_Bawden
B_Squance DF | DF P_Sulley
L_Titcombe DF | DF H_Jose
J_Farelo DM | DM C_van_Kuyt
A_Frandson MF | MF P_Silva
L_Alves MF | MF K_Padgit
B_Cumberland MF | MF C_Holton
Z_Densem FW | FW R_Bautista
C_Nesling FW | FW N_Arnold
|
E_Eayrs SUB | SUB M_Sinclair
M_Dumas SUB | SUB S_Thwaites
P_Verri SUB | SUB E_Pretty
F_Daud SUB | SUB J_Pople
J_Tipping SUB | SUB J_da_Silva

and here is the pattern that I am using to look for this data.

/(GK|DF|DM|MF|AM|FW|SUB)\s\|\s(GK|DF|DM|MF|AM|FW|SUB)/

where I am trying to match the two to three capital letters that flank the | and space on each side. If they match, grab the entire line. However with preg_match and preg_grep (the default data is in an array, and is converted to a string with preg_match), I'm unable to have them find a match even though using regex validators (like the one in eclipse) state that it should work.

Using PHP 5.1, Apache 2.2, on FreeBSD 6.2 if that helps at all.

Thanks to anyone that can help out here.
John Kimball
Ranch Hand

Joined: Apr 13, 2009
Posts: 96
You may need to escape the middle "|" in your match expression.
Chris Creed
Ranch Hand

Joined: Feb 27, 2009
Posts: 69

John Kimball wrote:You may need to escape the middle "|" in your match expression.


Thanks for the reply. Sadly it still returns empty.


John Kimball
Ranch Hand

Joined: Apr 13, 2009
Posts: 96
I'm not familiar with the particulars of PHP regexp, so here's all I can offer at this point:
- Is "\s" a valid way of expressing space?
- If not anything else, start with wildcards and reconstruct your regexp bit by bit until it works.
Chris Creed
Ranch Hand

Joined: Feb 27, 2009
Posts: 69

John Kimball wrote:I'm not familiar with the particulars of PHP regexp, so here's all I can offer at this point:
- Is "\s" a valid way of expressing space?
- If not anything else, start with wildcards and reconstruct your regexp bit by bit until it works.


Sorry for the delay.

according to various online sources that I have read, \s is valid to match any whitespace character.

As for wildcards that's the problem it can match where there is one instance, but never where there is an instance of both in the same string. For example if GK | GK exists and any character preceeding and subseeding exist, no match seems to ever be found where it has both items and only both showing. It's like that string fragment cannot be matched.

Also tried /.([A-Z]{2,3})\s.\s([A-Z]{2,3})./ and /.([A-Z]{2,3})\s\|\s(\1)./ with no success. It's like regular expression in PHP are buggy or something ebcause they seem to all validate when using the regex checker in eclipse.
John Kimball
Ranch Hand

Joined: Apr 13, 2009
Posts: 96
Does your regexp checker specifically say that it validates for PHP's regexp? If not, you have some more homework to do.

The very basic syntax is more or less identical for all regexp flavors-- . * ? + $ ^ and a few others that escape me at the moment.
But things tend to vary when you want to do anything fancier.

For example, some regexp tools expect the parenthesis to be ESCAPED to indicate grouping, otherwise they're treated literally.
Other languages & tools work in the reverse!

Also, if you're trying to match the same string before and after the pipe, then your expression is too loose.
It should be something like (GK \| GK) | (SUB \| SUB) | (DF \| DF) ...

Chris Creed
Ranch Hand

Joined: Feb 27, 2009
Posts: 69

John Kimball wrote:Does your regexp checker specifically say that it validates for PHP's regexp? If not, you have some more homework to do.

The very basic syntax is more or less identical for all regexp flavors-- . * ? + $ ^ and a few others that escape me at the moment.
But things tend to vary when you want to do anything fancier.

For example, some regexp tools expect the parenthesis to be ESCAPED to indicate grouping, otherwise they're treated literally.
Other languages & tools work in the reverse!

Also, if you're trying to match the same string before and after the pipe, then your expression is too loose.
It should be something like (GK \| GK) | (SUB \| SUB) | (DF \| DF) ...



Thanks for the info. I honestly thought that for regex since it was based upon Perl it would have been standardized by now. Oops.

On a side note with a bit of help from co-workers I found a pattern that worked just nicely for my needs. For those interested, it was ((?:[A-Z][A-Z]+))(\\s+)(\\|)(\\s+)((?:[A-Z][A-Z]+)). This site here (http://txt2re.com/) is quite handy.

Mr. Kimball thanks muchly for the help!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Problems with regular expressions