aspose file tools*
The moose likes Java in General and the fly likes Regular expression Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of JavaScript Promises Essentials this week in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regular expression" Watch "Regular expression" New topic
Author

Regular expression

Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2579

I am trying few things with regular expression, I would like to allow following characters, how do I do this?
N
N-N
N+
N-
N,N
wher N is a digit.
TIA,
- Manish
Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2579

This is what I have, which works for single comma, but I need sth for multiple commas, what do I do?
[0-9]+[-+,]?[0-9]*
I tried following for allowing multiple commas but they don't work
[0-9]+[-+]?[,]*[0-9]*
[0-9]+[-+]?,*[0-9]*
TIA,
- Manish
Wayne L Johnson
Ranch Hand

Joined: Sep 03, 2003
Posts: 399
Could you provide some concrete examples of VALID strings for each case? I ask because I tested the following set of strings:
1) "12345"
2) "123-99"
3) "1231+"
4) "999-"
5) "123,456"
6) "123,,,456"
7) "1234+,,,567"
Using each of the three patterns you provided. The first five matched all three patterns. The last two didn't match the first pattern, but did match with the second and third patterns.
So when you say "multiple commas", do you mean "1234,,,,6789", or do you mean "1234,5678,9000"? I ask because the only example you give is "N,N", and it isn't clear what you are looking for.
Also, when you have a "+" or a "-" in the String, are commas allowed? You can see by the above examples that #7, which has a "+" and a bunch of commas, matches OK, which might not be desirable.
Getting regular expressions right can be a very tricky thing, but almost anything is possible. Please just give us more information about what is valid and what is not valid.
Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2579

Thanks Wayne!
besides the examples I have given, this is what I want to be a valid pattern -
"1234,5678,9000"
TIA,
- Manish
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
The * means zero or more (as many as possible). If you want to require at least one comma, but also allow more, use + instead of *.
Note that for a digit you can use \d (or as a String literal, "\\d") instead of [0-9].
Now when you say that your attempts for multiple commas "don't work", what do you mean? Give an example of a string that should match, but doesn't. Or give an example of a match that does match, but shouldn't. What types of "multiple commas" do you want?
N,,N
N,N,
N,N,N
N,N+
N+N,N,N
There are lots of different possibilities here, and it's very difficult to construct a pattern without a better understanding of what rules you want to implement.
You may find the regex coach useful for testing ideas and debugging. Though it only handles Perl-style regexes; some things allowed in java.util.regex won't work here.


"I'm not back." - Bill Harding, Twister
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Try something like:
\d+(,\d+)*
(Double the slashes for a JAva string literal.)
I don't know how or if the +/- signs would be part of a multicomma string, so I've left them out for now. If you want them, show some examples that use them (with multiple commas).
[ December 16, 2003: Message edited by: Jim Yingst ]
Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2579

Hi Jim,
This is a valid string I want to work, but it doesn't (if condition fails)
"1234,5678,9000"
Rest of the stuff is working
- Manish
Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2579

My valid strings would be
1234
1234+
1234-
1234-5678
123,456,789
123,456
- Manish
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
OK, so + and - are never used in combination with commas? It's probably easist to just use two different patterns joined by an or (|):
\d+(,\d+)*|\d+([+-]\d*)?
I'd probably modify this to
\d++(?:,\d++)*+|\d++([+-]\d*+)?+
since (a) noncapturing groups are more efficient if you don't need to actually capture a group's value, and (b) possessive quantifiers can eliminate many confusing effects that may result from backtracking. There's no need for any backtracking to parse these patterns, so I prefer to eliminate it. Unfortunately posessive quantifiers aren't handled by Regex Coach... The first form should be easier to read and debug though - work with it first.
[ December 16, 2003: Message edited by: Jim Yingst ]
Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2579

Jim,
This one worked for me -
[0-9]+[-+,]?[0-9]*
and I am just wondering if I could modify this to do what I am trying to do. I am new to this stuff, so trying to undersatnd how it's working...
- Manish
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Well, are + and - ever used in the same extression as commas? They aren't in any of your examples so far. If they're not mixed, then I believe it will be simpler for you to write one pattern which matches expressions with commas, and another expression which matches expressions with +/-. Then join them with a |. This will prbably be simpler to understand than your other options.
If commas and +/- can be mixed on the same line, show some examples. we can't write a pattern that mixes them unless we know what's allowed and what isn't. It may be possible to modify your previous solution, but I can't say without konwing more of your rules.
Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2579

That's all I need -
1234 (Number)
1234+ (greater than this)
1234- (less than)
1234-5678 (range)
123,456,789 (number)
123,456 (number)
I tried this but it didn't work
[0-9]+[-+]?[,]*[0-9]*
- manish
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Have you tried the solution I previosly suggested?
If you just want to study your own solution - well, one problem is that the only way it allows multiple commas is if they're consecutive. That is, your solution will match
123,,456
123,,,,456
but not
123,456,789
That's because the commas are represented with ",*" - which matches any number of consecutive commas, but no numbers.
I've already showed a simple way to match the multicomma strings you want:
\d+(,\d+)*
The key here is that the comma only gets repeated when the whole expression in parentheses (,\d+) is repeated. So this is equivalent to any of:
\d+
\d+,\d+
\d+,\d+,\d+
\d+,\d+,\d+,\d+
\d+,\d+,\d+,\d+,\d+
\d+,\d+,\d+,\d+,\d+,\d+
etc.
If you prefer to use [0-9] for some reason, fine:
\[0-9]+(,[0-9]+)*
Manish Hatwalne
Ranch Hand

Joined: Sep 22, 2001
Posts: 2579

Thanks Jim,
It works!!! :-) :-)
- Manish
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Regular expression