Shoutbox

regexp problem - Printable Version

-Shoutbox (https://shoutbox.menthix.net)
+-- Forum: MsgHelp Archive (/forumdisplay.php?fid=58)
+--- Forum: Messenger Plus! for Live Messenger (/forumdisplay.php?fid=4)
+---- Forum: Scripting (/forumdisplay.php?fid=39)
+----- Thread: regexp problem (/showthread.php?tid=99634)

regexp problem by agustin029 on 09-21-2012 at 12:09 AM

how I can combine these two regex??

code:
link = msg.match(/((ftp|https|http)?:\/\/[^\s]+)/g); //returns http,https and ftp links
link2 = msg.match(/(www?.[^\s]+)/g); //returns www. links


RE: regexp problem by whiz on 09-21-2012 at 09:34 AM

I think you might have a few issues with those expressions.  You need to escape literal "." characters (/./ matches any character).  Also, the ? after "www" only makes the third "w" optional (i.e. it will match "ww" or "www", which I'm assuming is not what you want).  Also, the former expression will match :// (without a protocol identifier; again, don't know if this is what you want or not).

But because you've used [^\s]+ to match the rest of the address, and the protocol/www is optional, this will match anything.

If you just want to match any URL, regardless of the protocol or www at the start, perhaps something like this?

Javascript code:
link = msg.match(/(((ftp|https|http):\/\/)?[A-Za-z0-9-_~:/\?#@!\$&'\(\)\*\+,;=]+(\.[A-Za-z0-9-_~:\/\?#@!\$&'\(\)\*\+,;=]+)+)/g);


That looks a bit of a mouthful.  Basically, [A-Za-z0-9-_~:/\?#@!\$&'\(\)\*\+,;= ]+ matches any characters valid in a URL (which includes the www, so I've left that out.  It will try to match the protocol, followed by two or more sets of valid characters, separated by a ".".  So it would match foo.com, foo.co.uk, www.foo.co.uk, www.bar.foo.co.uk and so on, as well as any of these with a protocol at the start.

It's still not perfect though, as it will match any two (or more) sets of valid characters separated with a ".".