aspose file tools*
The moose likes Java in General and the fly likes Regex help. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex help." Watch "Regex help." New topic
Author

Regex help.

Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
I need to parse a String that can be formatted in a variety of ways. The String is a feet-inch-fraction format. Any of the following values are valid:

I can grind it out with several different patterns but it seems there should be a better way.
Thanks in advance,
Michael Morris


Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius - and a lot of courage - to move in the opposite direction. - Ernst F. Schumacher
Cindy Glass
"The Hood"
Sheriff

Joined: Sep 29, 2000
Posts: 8521
Gee - aren't WE a bunch of help .
It looks like it will be messy no matter how you handle it.


"JavaRanch, where the deer and the Certified play" - David O'Meara
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Thanks for noticing at least Cindy I'm just grinding it out. The code is nasty-looking, but so far seems to be working. It's difficult to set up a unit test on it and be totally satisifed that you've considered all the combinations.
Michael Morris
Dirk Schreckmann
Sheriff

Joined: Dec 10, 2001
Posts: 7023
Just in case you didn't already know about it, you might have some success browsing through http://www.regexlib.com/


[How To Ask Good Questions] [JavaRanch FAQ Wiki] [JavaRanch Radio]
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Thanks Dirk. That looks like a pretty neat site. I've been away from Unix too long. I used to could grep with the best of 'em. I'm having to recall most of the regex stuff.
Michael Morris
Leslie Chaim
Ranch Hand

Joined: May 22, 2002
Posts: 336
The String is a feet-inch-fraction format.

Not exectly sure what that means . Also, is this to merely validate your string or would you like to glean out the values?
Any of the following values are valid:
Mike, can you plesae list some strings which would be invalid?
Cheers,
Leslie


Normal is in the eye of the beholder
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Hi Leslie,
I knew I could count on a PERL evangelist to come thru on this .
We have an app that generates steel fabrication drawings. When we first started, some 15 years ago, everything was input by hand, now almost everything is read in from a neutral file generated by a structural modeling program. The format, feet-inch-fraction format is actually feet, inch and sixteenth inch. 95% percent of the time it would be like: 23'-2 9/16 or 23 feet, 2 and nine-sixteenth inches. But it can also be floating point or integral feet like:1.8125 or even whole, integral or mixed number inches like: 3.0625". Note that in the first example above that either 23'2 9/16 or 23-2 9/16 would also be legal and an optional inch tick could follow: 23'-2 9/16". Illegal values would be any string which contained any non digit, single quote, dash, period, double quote or forward slash. No white space is allowed except at least one space must occur between the inch integer and the inch fraction (exterior white space is OK which can be trim()ed). Obviosly any out of sequence separators would be illegal or a missing feet value followed by the feet tick like: '-2 3/4.
If you need more info let me know.
Michael Morris
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451

Also, is this to merely validate your string or would you like to glean out the values?

The plan is to either throw a NumberFormatException on an invalid string or return a double from a static method.
Michael Morris
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
What about
^\d+'?[ -]?(?:\d*(?: ?\d+/\d+)?"?)?$|^\d*.\d+['"]?$
I'm not in a position to test this though. It consists of two alternative regexps. The first is the fractional syntax
^\d+'? feet, optionally followed by '
[ -]? optional separator
(?:\d* whole number of inches, inside an optional non-capturing group
(?: ?\d+/\d+)? fractional number of inches, inside another optional group
"?)?$ the optional closing "
The second uses decimal syntax
^\d* integral part, zero or more digits
.\d+ fractional part, one or more digits
['"]?$ optional feet or inch marker
Probably not 100% there, but it seems comes some way. Not tested, as I said, so no guarantees whatsoever.
- Peter
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Thanks for the regex Peter. It seems to work on most everthing but gives some false positives. I think I going to go with a divide and conquer strategy. I am going to look at splitting using "['\-]". That should give me the feet side and inch side in separate strings or just the whole string. Then the regexes become much easier to work with.

Thanks everyone,
Michael Morris
[ March 25, 2003: Message edited by: Michael Morris ]
Leslie Chaim
Ranch Hand

Joined: May 22, 2002
Posts: 336
Hi Michael,
I�m sorry I did not get back sooner, I just got a bit overworked at work Furthermore, when I give a solution I always get carried away with my palaver and all that takes time. Nevertheless, I feel somewhat obligated to post a decent reply after you went through all the trouble explaining your problem.
I understand that you are opting (or thinking) about the divide-and-conquer approach but for me it�s about the challenge and passion of regex, not necessarily about Perl . (BTW it�s Perl not PERL ). Your problem is interesting and the challenge is worth it!
I knew I could count on a PERL evangelist to come thru on this .
Well thanks Mike actually, and for the record, my Perl obsessions root back to the MRE book, and it�s all Jeff�s fault
OK, on to the matter:
After (I hope) carefully reading your description, it seems to me that you need a pattern as follows:
  • NUMBER (integral or floating) followed by
  • Feet or inch symbol (?) FB
  • Dash (?) FB
  • Integer (?) FB
  • One space and the fraction (?) FB
  • Inch tick (?)


  • Yes, 95% you will have all the data but except for the first NUMBER everything else must be optional for obvious reasons. Notice the �Space and the fraction� as one unit. It�s always good to be as specific as possible to the regex engine this help avoid false-positives (whatever they're suppose to mean ).
    I purposely did not give the regex chars first to give you (and myself) a starting place how to construct the pattern. If you find any problem with the actual regex I proposed, we should first examine if we studied the sought pattern.
    Here's the regex:
    ^(\d*\.?\d*[1-9]+\d*|[1-9]+\d*\.\d*)['"]?-?\d?( \d{1,2}/\d{1,2})?"?$
    And the breakdown, they correspond one-to-one with the previous list:
  • (\d*\.?\d*[1-9]+\d*|[1-9]+\d*\.\d*)
  • ['"]?
  • -?
  • \d?
  • ( \d{1,2}/\d{1,2})?
  • "?


  • The entire expression is wrapped around '^' and '$'. The NUMBER is restricted that it does not allow zero as a value. The fraction is also restricted it only allows two numbers for the numerator and the denominator. Of course, '\d' is really [0-9] and some times you may want to be more specific in the regex, but regex is not for everything and we can use other tools too.
    And yes I asked Perl for some help:

    Here's what I've tested:

    $ perl checkit
    Data: .25
    It's a match!
    Data: 23'-2 9/16
    It's a match!
    Data: '-2 3/4
    No Match
    Data: .25'
    It's a match!
    Data: .25"
    It's a match!
    Data: 10
    It's a match!
    Data: 0
    No Match

    Data: 2'
    It's a match!
    Data: 3"
    It's a match!
    Data: 1.37'
    It's a match!
    Data: 1'2
    It's a match!
    Data: 1'-2
    It's a match!
    Data: 1-2 3/4
    It's a match!
    Data: 3'- 13/16
    It's a match!
    Data: 3' 2
    No Match
    Data: 3' 2"
    No Match

    Data: 2'3"
    It's a match!
    Data: 3'- 20/100
    No Match
    Data: <CTRL D>
    $

    I hope that this is good enough for you. The next thing you may want is to pluck out the values, well for that you'd put some capturing perens and obtain the values from Matcher.group() method. That, I think, is simple.
    With pleasure,
    Leslie
    [ March 26, 2003: Message edited by: Leslie Chaim ]
    Michael Morris
    Ranch Hand

    Joined: Jan 30, 2002
    Posts: 3451
    Hi Leslie,
    Thanks for the effort. It certainly appears to cover all the bases.

    (BTW it�s Perl not PERL ).

    Duly noted. I know, that I still get upset when I see someone writing or saying X Windows when everyone should know that it is the X Window system. Pluralizing it is just wrong.
    Anyway, if you ever find yourself in the Longview/Tyler, Texas area, look me up and I'll treat you to a longneck and a rare steak.
    Michael Morris
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: Regex help.
     
    Similar Threads
    Reg. GC from JavaCaps
    Hibernate and MySQL Views
    Ignoring spaces/multiple spaces in a txt file (replaceAll method problem)
    Strange thing!(Array init)
    Combinational Calculation generation