【问题标题】：Extract all links from a string with google app script使用谷歌应用脚本从字符串中提取所有链接
【发布时间】：2023-03-05 01:14:01
【问题描述】：

我有一个带有链接的字符串变量（在其他文本中），我希望能够提取包含某个赞助人的所有链接（比如包含单词“case”）...这可能吗？

变量字符串类似于：

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';

作为一种解决方法，我使用了这里描述的内容：extract links from document，以创建一个以字符串为内容的文档，然后提取链接，但我想直接这样做...

问候，

编辑（致鲁本）：

如果我使用：

var string = 'http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more ';

我只得到了第一个链接两次（见截图here）。

如果我使用：

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html ';

还是一样（见截图here）。

【问题讨论】：

“内部带有链接的字符串变量”是什么意思？他们是网址吗？包含示例字符串可以阐明您的意思。你试过什么？
好的。变量字符串类似于： var string = '这里有一些文本行，其中会有stackoverflow.com/questions/40725199/… 等链接';

标签： javascript regex google-apps-script

【解决方案1】：

Google Apps 脚本

function test2(){
  var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
  var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
  for(var i = 0; i <= re.exec(string).length; i++){
    if(re.exec(string)[i]) Logger.log(re.exec(string)[i]) 
  }
}

JavaScript。

var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
for(var i = 0; i <= re.exec(string).length; i++){
  if(re.exec(string)[i]) console.log(re.exec(string)[i])
}

参考

RegularExpression to Extract Url For Javascript

【讨论】：

好的，我在这个字符串上使用了你的更新版本：'mangafox.me/manga/tales_of_demons_and_gods/c105/1.html stackoverflow.com/questions/40725199/… 这里有一些文本行，其中会有stackoverflow.com/questions/40725199/… 和更多mangafox.me/manga/tales_of_demons_and_gods/c105/1.html'之类的链接；
我只获得第一个链接时遇到了同样的问题。 @Rubén 过去 4 年有什么进展吗？ :-)
@BjörnLarsson 此答案中的代码工作正常。请发布一个新问题，包括minimal reproducible example。

【解决方案2】：

如果您只获得第一个匹配项，那么我认为您需要正则表达式上的“g”标志来捕获所有匹配项，然后每次调用 exec() 都会返回下一个匹配项。我正在使用：

const re = /(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])/igm;

while ((reResults = re.exec(s)) !== null) { //finds next match
      Logger.log(reResults[0]); //result of next match
}

【讨论】：