aspose file tools*
The moose likes Java in General and the fly likes regex to find url Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "regex to find url" Watch "regex to find url" New topic
Author

regex to find url

Niklas Rosencrantz
Ranch Hand

Joined: Apr 08, 2006
Posts: 49
Hello, I need a regexp to find urls in text. My Pattern looks like this:

Pattern p =
Pattern.compile("(@)?(href=\")?(http://)?[A-Za-z]+(\\.\\w+)+(/[&\\n=?\\+\\%/\\.\\w]+)?");

How can I modify the regexp s� that it also finds url:s that have a dash and/or figures in the name e.g. www.123-abc.com ?

Thanks in advance

Niklas
[ February 10, 2007: Message edited by: Niklas Rosencrantz ]
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18996
    
  40

Originally posted by Niklas Rosencrantz:
How can I modify the regexp s� that it also finds url:s that have a dash and/or figures in the name e.g. www.123-abc.com ?


The part of your regex that relates to the hostname of the url is:



The first part of the regex, is for the first field, like "www". And the next part of the regex, is for everything else (which also allows numbers).

To allow dashes anywhere in the hostname, you can just add it to the regex, like this:



Henry
[ February 10, 2007: Message edited by: Henry Wong ]

Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Niklas Rosencrantz
Ranch Hand

Joined: Apr 08, 2006
Posts: 49
Thank you for replying. Matching any conceivable url is a difficult problem. Here is the code I'm using now:



It executes correctly but can be improved. The if-test of top-domain is to ignore false matches e.g. "example.abc" must be ignored but not "example.biz".
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: regex to find url