【发布时间】:2021-10-23 00:10:43
【问题描述】:
我试图让脚本抓取一个网站,只寻找具有 .php?id= 的hrefs 我可以使用bs4 打印所有hrefs 但不能从.php 中选择一个?id= 并打印出来
<li><a href="#">Education & Research </a>
<ul>
<li><a href="caseofthe_month.php">Case of the Month</a></li>
<a href="page.php?id=2">
<a href="idontwantthispagetoshowup.php">
<a href="page.php?id=5">Prospectus Fellowship-July-14</a>
<a href="thisoneeither.php">
'''
def gethref(ip):
url = ("http://" + ip)
print("[x] ~ SCAN: " + url + " ~ [x]")
req = requests.get(url)
tree = html.fromstring(req.text)
tree_href = tree.xpath('//@href')
#print(tree_href)
if '*.php?id=*' in tree_href:
print (tree_href)
#soup = BeautifulSoup(req.text, 'html.parser')
#h = soup.find_all('href=*.php')
#print(h)
#sqli = soup.select('a')
#for link in soup.find_all('a'):
# sqli = (link.get('href'))
# sqli = str(sqli)
# print(sqli)
# if 'page' in sqli:
# print(sqli.a)
【问题讨论】:
-
请发布您的完整代码(包括导入,
html是什么?)
标签: python python-3.x beautifulsoup lxml href