Python 3 使用 bs4 提取跨度标签答案

【问题标题】：Python 3 Extracting span tag using bs4Python 3 使用 bs4 提取跨度标签
【发布时间】：2020-06-26 01:20:04
【问题描述】：

我有一个页面的 span 标签

<span itemprop="name">
            DeWalt DCD778D2T-GB  18V 2.0Ah Li-Ion XR Brushless Cordless Combi Drill
        </span>

如何提取 span 标签内的文本，我尝试使用一些 find 方法但没有收到任何项目对象错误

下面是我试过的代码，我哪里出错了？

r=requests.get('https://www.screwfix.com/p/dewalt-dcd778d2t-gb-18v-2-0ah-li-ion-xr-brushless-cordless-combi-drill/268fx')

c=r.content
soup=BeautifulSoup(c,"html.parser")
ToolName1 = soup.find("span", {"itemprop" : "name"}).text

我的错误是

AttributeError: 'NoneType' 对象没有属性 'text'

【问题讨论】：

标签： python beautifulsoup

【解决方案1】：

实际上，你得到了 r.status.code 403（禁止），然后 repr(soup) 是空字符串，所以你得到 None for soup.find("span", {"itemprop" : "name"})。这意味着 None.text 这就是为什么你得到 AttributeError: 'NoneType' object has no attribute 'text'。

你需要为这个url指定标题，可能只是User-Agent作为标题

import requests
from bs4 import BeautifulSoup

url = ('https://www.screwfix.com/p/dewalt-dcd778d2t-gb-18v-2-0ah-li-ion-xr-'
       'brushless-cordless-combi-drill/268fx')

headers = {'User-Agent': ('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWeb'
                          'Kit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.14'
                          '9 Safari/537.36')}

r = requests.get(url, headers=headers)
if r.status_code == 200:
    c = r.content
    soup = BeautifulSoup(c,"html.parser")
    ToolName1 = soup.find("span", {"itemprop" : "name"}).text
    print(ToolName1.strip())

那么你会得到这个

DeWalt DCD778D2T-GB  18V 2.0Ah Li-Ion XR Brushless Cordless Combi Drill

status_code 200 是一般情况下的成功，有一些状态码，不是200，仍然代表成功。

【讨论】：

谢谢你也解决了我的另一个问题:)