First a minor note: You can replace
http[s]{0,1}
with
https?
which should mean the same thing, but is both shorter and faster in most engines.
As for the rest: I gather the main problems are when the URL is terminated by something other than a space - e.g. '<', '>', or '.'. The first two are easy - just include <> in the exclusion list for the character class. The . is a bit more complex, since it's perfectly OK within a URL, but it can't be the final character. You can make a separate class for that final char:
(\bhttps?://[^ <>]*[^ <>.]\b)
I think those
word boundaries (/b) could be a problem too, particularly the last one. You might have something like
foo,http://www.yahoo.com/,bar
The final / should be part of the URL, but the , is not, and there's no word boundary between those two chars - they're both non-word. How about something like this:
\bhttps?://[^ <>,]*[^ <>,.]
I added ',' to the list of forbidden chars. I'm sure you could find more...
[ November 14, 2003: Message edited by: Jim Yingst ]