用于从字符串中提取所有 url 的正则表达式答案

【问题标题】：regex for extracting all urls from string用于从字符串中提取所有 url 的正则表达式
【发布时间】：2018-06-20 11:17:51
【问题描述】：

我正在尝试从一段字符串中提取 URL 我有不同的帖子在他们的消息中包含 URL。我已经准备了一个匹配的模式，但它不能正常工作。

尝试过正则表达式

$pattern1= '%\b((https?://)|(www\.)|(^[\D]+\.))[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))%';
$pattern2= '%\b^((https?://)|(www\.)|(^[a-z]+\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%';

代码

for ( $i = 0; $i < $resultcount; $i ++ ) {
    $pattern = '%\b^((https?://)|(www\.)|(^[a-z]+\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%';
    $message = (string)$result[$i]['message'];
    preg_match_all($pattern,$message,$match);
    print_r($match);
    }

这样的帖子示例

"这只是一个测试正则表达式提取 URL 的帖子 http://google.com, https://www.youtube.com/watch?v=dlw32af https://instagram.com/oscar/en.wikipedia.org"

多个网址的帖子可能有逗号，也可能没有逗号

谢谢大家:)

【问题讨论】：

标签： php regex

【解决方案1】：

这应该让你开始：

\b(?:https?://)?(?:(?i:[a-z]+\.)+)[^\s,]+\b

分解，这说：

\b                   # a word boundary
(?:https?://)?       # http:// or https://, optional
(?:(?i:[a-z]+\.)+)   # any subdomain before
[^\s,]+              # neither whitespace nor comma
\b                   # another word boundary

见a demo on regex101.com。

【讨论】：

嘿，谢谢，但我需要提取所有提到的 URL。. 可以帮助我吗？
是的，它就像魅力一样，但如果网址有“，”那么它会将所有网址合二为一，以解决逗号分隔的网址吗？
@Mr.Pyramid：检查新更新的答案 - 请把您的要求更清楚地放在首位:) regex101.com/r/9hhKLS/3

【解决方案2】：

首先我会分析一些维基百科的 URL，这些 URL 清楚地显示在附加屏幕截图中，然后编写正则表达式！

https:\/\/en.wikipedia.org\/wiki\/(.*)

【讨论】：