| Author |
Regex pattern
|
Jacob Sonia
Ranch Hand
Joined: Jun 28, 2009
Posts: 164
|
|
Hi,
I have these example urls
http://twitter.com/*
http://twitter.com/*/rs
Now * can be anything like user_name, user.name etc
I could come up with only one pattern of extracting but it returns / as well when it is present. Please help me with a more correct one.
This is my java program
|
 |
Rob Spoor
Sheriff
Joined: Oct 27, 2005
Posts: 19216
|
|
Let's break down your regex:
- (?<=http[s]?://twitter.com/) - a positive lookbehind for http://twitter.com/ and https://twitter.com/. Looks fine to me
- ($|(.*)/|(.*)|\\?=)
--- $ - end of string
--- (.*)/ - anything followed by /
--- (.*) - anything
--- \\?= - a ? followed by =
You clearly specify that you want / inside your match, both in (.*) and in (.*)/
An easy fix: change both occurrences of .* into [^/]*. In other words, anything but a /. That still means you match anything but a / followed by a /, so remove that part. What remains: "(?<=http[s]?://twitter.com/)($|([^/]*)|\\?=)"
By the way, your while loop is actually an if-loop because of the break. So just change it into one.
|
SCJP 1.4 - SCJP 6 - SCWCD 5
How To Ask Questions How To Answer Questions
|
 |
Jacob Sonia
Ranch Hand
Joined: Jun 28, 2009
Posts: 164
|
|
Hey thanks a lot for the reply, it really helped me. Please guide me what book should i read for understanding the basics of regex pattern.
Also i have this problem - Here i want everything after http://abc.com* except http://abc.com/xyz* - means all would be accepted which starts with http://abc.com but the one which starts with http://abc.com/xyz will not be accepted. I tried this, but i think this is not that great, there is some problem to it,it doesn't match the last one.
|
 |
Raymond Tong
Ranch Hand
Joined: Aug 15, 2010
Posts: 156
|
|
There is some url above regular expression
http://www.regular-expressions.info/
http://download.oracle.com/javase/tutorial/essential/regex/
This will fail for
You don't have to escape "/" by using "\\/", simply "/" is ok
if sub-domain (www) is optional, you may want to use "?"
you may want to have a slash "/" after your (ae|com)
It may be easier for you to write down the pattern using pen and paper
before turning it to regular expression.
|
 |
Rob Spoor
Sheriff
Joined: Oct 27, 2005
Posts: 19216
|
|
Check out java.util.regex.Pattern for negative lookahead. What you basically need:
- http://abc.com
- a negative lookahead for /xyz
- anything else
|
 |
Jacob Sonia
Ranch Hand
Joined: Jun 28, 2009
Posts: 164
|
|
Hi, I tried this after looking at java.util.pattern
String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(com)($|[.* && ?![xyz]*])" ;
Doesn't work either
|
 |
Raymond Tong
Ranch Hand
Joined: Aug 15, 2010
Posts: 156
|
|
Jacob Sonia wrote:Hi, I tried this after looking at java.util.pattern
String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(com)($|[.* && ?![xyz]*])" ;
Doesn't work either 
Here is more details description for regular expression
http://www.regular-expressions.info/lookaround.html
|
 |
Jacob Sonia
Ranch Hand
Joined: Jun 28, 2009
Posts: 164
|
|
|
another try String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(ae|com)($|(?!(/xyz).*).*)" ;
|
 |
Rob Spoor
Sheriff
Joined: Oct 27, 2005
Posts: 19216
|
|
|
You should always check the Javadocs of java.util.regex.Pattern for the syntax. I see you're using a !, but that's not supported in Java. I already told you how to do this, using the negative lookahead.
|
 |
Jacob Sonia
Ranch Hand
Joined: Jun 28, 2009
Posts: 164
|
|
Hi,
But whatever I created is supported. Why do you think that ! Is not supported. For me the pattern works as expected.
|
 |
 |
|
|
subject: Regex pattern
|
|
|