【问题标题】:Retrieve the first href from a div tag从 div 标签中检索第一个 href
【发布时间】:2024-01-11 05:52:01
【问题描述】:

我需要检索的是包含/questions/20702626/javac1-8-class-not-found 的href。但是我得到的下面代码的输出是//*.com:

from bs4 import BeautifulSoup
import urllib2

url = "http://*.com/search?q=incorrect+operator"
content = urllib2.urlopen(url).read()

soup = BeautifulSoup(content)

for tag in soup.find_all('div'):
    if tag.get("class")==['summary']:
        for tag in soup.find_all('div'):
            if tag.get("class")==['result-link']:
                for link in soup.find_all('a'):
                        print link.get('href')
                    break;

【问题讨论】:

    标签: python html beautifulsoup html-parsing href


    【解决方案1】:

    不要做嵌套循环,而是写一个CSS selector

    for link in soup.select('div.summary div.result-link a'):
        print link.get('href')
    

    这不仅更具可读性,而且还解决了您的问题。它打印:

    /questions/11977228/incorrect-answer-in-operator-overloading
    /questions/8347592/sizeof-operator-returns-incorrect-size
    /questions/23984762/c-incorrect-signature-for-assignment-operator
    ...
    /questions/24896659/incorrect-count-when-using-comparison-operator
    /questions/7035598/patter-checking-check-of-incorrect-number-of-operators-and-brackets
    

    附加说明:您可能希望考虑使用StackExchange API 而不是当前的网络抓取/HTML 解析方法。

    【讨论】:

      最近更新 更多