【问题标题】:Extract all links from a string with google app script使用谷歌应用脚​​本从字符串中提取所有链接
【发布时间】:2023-03-05 01:14:01
【问题描述】:

我有一个带有链接的字符串变量(在其他文本中),我希望能够提取包含某个赞助人的所有链接(比如包含单词“case”)...这可能吗?

变量字符串类似于:

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';

作为一种解决方法,我使用了这里描述的内容:extract links from document,以创建一个以字符串为内容的文档,然后提取链接,但我想直接这样做...

问候,

编辑(致鲁本):

如果我使用:

var string = 'http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more ';

我只得到了第一个链接两次(见截图here)。

如果我使用:

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html ';

还是一样(见截图here)。

【问题讨论】:

  • “内部带有链接的字符串变量”是什么意思?他们是网址吗?包含示例字符串可以阐明您的意思。你试过什么?
  • 好的。变量字符串类似于: var string = '这里有一些文本行,其中会有stackoverflow.com/questions/40725199/… 等链接';

标签: javascript regex google-apps-script


【解决方案1】:

Google Apps 脚本

function test2(){
  var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
  var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
  for(var i = 0; i <= re.exec(string).length; i++){
    if(re.exec(string)[i]) Logger.log(re.exec(string)[i]) 
  }
}

JavaScript。

var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
for(var i = 0; i <= re.exec(string).length; i++){
  if(re.exec(string)[i]) console.log(re.exec(string)[i])
} 

参考

RegularExpression to Extract Url For Javascript

【讨论】:

【解决方案2】:

如果您只获得第一个匹配项,那么我认为您需要正则表达式上的“g”标志来捕获所有匹配项,然后每次调用 exec() 都会返回下一个匹配项。我正在使用:

const re = /(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])/igm;

while ((reResults = re.exec(s)) !== null) { //finds next match
      Logger.log(reResults[0]); //result of next match
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-03-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多