• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regex to find url

 
Niklas Rosencrantz
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello, I need a regexp to find urls in text. My Pattern looks like this:

Pattern p =
Pattern.compile("(@)?(href=\")?(http://)?[A-Za-z]+(\\.\\w+)+(/[&\\n=?\\+\\%/\\.\\w]+)?");

How can I modify the regexp s� that it also finds url:s that have a dash and/or figures in the name e.g. www.123-abc.com ?

Thanks in advance

Niklas
[ February 10, 2007: Message edited by: Niklas Rosencrantz ]
 
Henry Wong
author
Marshal
Pie
Posts: 21015
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Niklas Rosencrantz:
How can I modify the regexp s� that it also finds url:s that have a dash and/or figures in the name e.g. www.123-abc.com ?


The part of your regex that relates to the hostname of the url is:



The first part of the regex, is for the first field, like "www". And the next part of the regex, is for everything else (which also allows numbers).

To allow dashes anywhere in the hostname, you can just add it to the regex, like this:



Henry
[ February 10, 2007: Message edited by: Henry Wong ]
 
Niklas Rosencrantz
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for replying. Matching any conceivable url is a difficult problem. Here is the code I'm using now:



It executes correctly but can be improved. The if-test of top-domain is to ignore false matches e.g. "example.abc" must be ignored but not "example.biz".
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic