File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Regex pattern Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex pattern" Watch "Regex pattern" New topic
Author

Regex pattern

Jacob Sonia
Ranch Hand

Joined: Jun 28, 2009
Posts: 174
Hi,

I have these example urls
http://twitter.com/*
http://twitter.com/*/rs

Now * can be anything like user_name, user.name etc

I could come up with only one pattern of extracting but it returns / as well when it is present. Please help me with a more correct one.

This is my java program


Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19760
    
  20

Let's break down your regex:
- (?<=http[s]?://twitter.com/) - a positive lookbehind for http://twitter.com/ and https://twitter.com/. Looks fine to me
- ($|(.*)/|(.*)|\\?=)
--- $ - end of string
--- (.*)/ - anything followed by /
--- (.*) - anything
--- \\?= - a ? followed by =

You clearly specify that you want / inside your match, both in (.*) and in (.*)/
An easy fix: change both occurrences of .* into [^/]*. In other words, anything but a /. That still means you match anything but a / followed by a /, so remove that part. What remains: "(?<=http[s]?://twitter.com/)($|([^/]*)|\\?=)"

By the way, your while loop is actually an if-loop because of the break. So just change it into one.

SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Jacob Sonia
Ranch Hand

Joined: Jun 28, 2009
Posts: 174
Hey thanks a lot for the reply, it really helped me. Please guide me what book should i read for understanding the basics of regex pattern.

Also i have this problem - Here i want everything after http://abc.com* except http://abc.com/xyz* - means all would be accepted which starts with http://abc.com but the one which starts with http://abc.com/xyz will not be accepted. I tried this, but i think this is not that great, there is some problem to it,it doesn't match the last one.



Raymond Tong
Ranch Hand

Joined: Aug 15, 2010
Posts: 230
    
    2

There is some url above regular expression
http://www.regular-expressions.info/
http://download.oracle.com/javase/tutorial/essential/regex/


This will fail for


You don't have to escape "/" by using "\\/", simply "/" is ok
if sub-domain (www) is optional, you may want to use "?"
you may want to have a slash "/" after your (ae|com)

It may be easier for you to write down the pattern using pen and paper
before turning it to regular expression.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19760
    
  20

Jacob Sonia wrote:Also i have this problem - Here i want everything after http://abc.com* except http://abc.com/xyz* - means all would be accepted which starts with http://abc.com but the one which starts with http://abc.com/xyz will not be accepted.

Check out java.util.regex.Pattern for negative lookahead. What you basically need:
- http://abc.com
- a negative lookahead for /xyz
- anything else
Jacob Sonia
Ranch Hand

Joined: Jun 28, 2009
Posts: 174
Hi, I tried this after looking at java.util.pattern

String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(com)($|[.* && ?![xyz]*])" ;

Doesn't work either
Raymond Tong
Ranch Hand

Joined: Aug 15, 2010
Posts: 230
    
    2

Jacob Sonia wrote:Hi, I tried this after looking at java.util.pattern

String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(com)($|[.* && ?![xyz]*])" ;

Doesn't work either

Here is more details description for regular expression
http://www.regular-expressions.info/lookaround.html
Jacob Sonia
Ranch Hand

Joined: Jun 28, 2009
Posts: 174
another try String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(ae|com)($|(?!(/xyz).*).*)" ;
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19760
    
  20

You should always check the Javadocs of java.util.regex.Pattern for the syntax. I see you're using a !, but that's not supported in Java. I already told you how to do this, using the negative lookahead.
Jacob Sonia
Ranch Hand

Joined: Jun 28, 2009
Posts: 174
Hi,
But whatever I created is supported. Why do you think that ! Is not supported. For me the pattern works as expected.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Regex pattern