从字符串中搜索模式答案

【问题标题】：Search pattern from string从字符串中搜索模式
【发布时间】：2021-11-24 00:45:26
【问题描述】：

我正在使用BeautifulSoup 进行网络抓取，我需要获取一个位于 script 标记中的链接，所以我使用它

soup.find(string=re.compile("https://link9876.net/index.php"))

这会返回下一个字符串

"var link = [];
 link[0] = 'https://link1225.com/x/xxxxxx';
 link[1] = 'https://link9876.net/index.php?xxxxxxxxx';
 link[2] = 'https://link1356.com/index.php?xxxxxxxxx';
 ..."

（数组中元素的位置和数量每次都会变化）

但我只想得到“*https://link9876.net/index.php*", 解决这个问题的最佳方法是什么？

【问题讨论】：

这是一个简单的字符串搜索问题。找到link[1] =，然后抓取下一个单引号之前的所有内容。
@TimRoberts 不总是link[1] == my_link 如果我再次执行脚本，顺序会发生变化，现在可以是link[0] == my_link 或任何其他。
请发布代码以重现此结果。

标签： python regex beautifulsoup

【解决方案1】：

您可以只使用另一个正则表达式来提取任何必要的链接，例如：

import re

script_text = """var link = [];
 link[0] = 'https://link1225.com/x/xxxxxx';
 link[1] = 'https://link9876.net/index.php?xxxxxxxx1';
 link[2] = 'https://link9876.net/index.php?xxxxxxxx2';
 link[3] = 'https://link9876.net/index.php?xxxx3xxx';
 link[4] = 'https://link1356.com/index.php?xxxxx4xxx';
 link[5] = 'https://link1356.com/index.php?xxxxx4xxx';
 ..."""
 
for link in re.findall(r"'(https://link9876\.net/index\.php.*?)'", script_text):
    print(link)

会给你：

https://link9876.net/index.php?xxxxxxxx1
https://link9876.net/index.php?xxxxxxxx2
https://link9876.net/index.php?xxxx3xxx

【讨论】：