Python3打印特定的href链接答案

【问题标题】：Python3 print specific href linksPython3打印特定的href链接
【发布时间】：2021-10-23 00:10:43
【问题描述】：

我试图让脚本抓取一个网站，只寻找具有 .php?id= 的hrefs 我可以使用bs4 打印所有hrefs 但不能从.php 中选择一个?id= 并打印出来

<li><a href="#">Education & Research </a>
<ul>                         
<li><a href="caseofthe_month.php">Case of the Month</a></li>
<a href="page.php?id=2">
<a href="idontwantthispagetoshowup.php">
<a href="page.php?id=5">Prospectus Fellowship-July-14</a>
<a href="thisoneeither.php">

'''

def gethref(ip):
    url = ("http://" + ip)
    print("[x] ~ SCAN: " + url + " ~ [x]")
    req = requests.get(url)
    tree = html.fromstring(req.text)
    tree_href = tree.xpath('//@href')
    #print(tree_href)
    if '*.php?id=*' in tree_href:
        print (tree_href)
    #soup = BeautifulSoup(req.text, 'html.parser')
    #h = soup.find_all('href=*.php')
    #print(h)
    #sqli = soup.select('a')
    #for link in soup.find_all('a'):
    #   sqli = (link.get('href'))
    #   sqli = str(sqli)
    #   print(sqli)
    #   if 'page' in sqli:
    #       print(sqli.a)

【问题讨论】：

请发布您的完整代码（包括导入，html 是什么？）

标签： python python-3.x beautifulsoup lxml href

【解决方案1】：

这是你需要找到所有包含.php?id=的href的代码

from bs4 import BeautifulSoup
import requests
import re

def gethref(ip):
    url = ("http://" + ip)
    print("[x] ~ SCAN: " + url + " ~ [x]")
    req = requests.get(url)
    soup = BeautifulSoup(req.text, 'html.parser')
    h = soup.find_all(href=re.compile(r'(.*).php\?id=\d*'))
    print(h)
    # sqli = soup.select('a') # i don't know what its doing, so i just commented it out
    # for link in soup.find_all('a'):
    #   sqli = str(link.get('href'))
    #   print(sqli)
    #   if 'page' in sqli:
    #       print(sqli.a)

我想这就是你需要的

如果它不起作用，请告诉我...

【讨论】：

【解决方案2】：

你可以使用 CSS 选择器a[href*=".php?id="]:

from bs4 import BeautifulSoup

html_doc = """
<li><a href="#">Education & Research</a>

<ul>                         
<li>
    <a href="caseofthe_month.php">Case of the Month</a>
</li>
</ul>

<a href="page.php?id=2"></a>
<a href="idontwantthispagetoshowup.php">
<a href="page.php?id=5">Prospectus Fellowship-July-14</a>
<a href="thisoneeither.php"></a>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for link in soup.select('a[href*=".php?id="]'):
    print(link["href"])

打印：

page.php?id=2
page.php?id=5

或者：

for link in soup.find_all("a"):
    if ".php?id=" in link.get("href", ""):
        print(link["href"])

或者：

for link in soup.find_all(
    lambda t: t.name == "a" and ".php?id=" in t.get("href", "")
):
    print(link["href"])

【讨论】：