Beautifulsoup 不适用于所有网址答案

【问题标题】：Beautifulsoup doesn't work properly with all urlsBeautifulsoup 不适用于所有网址
【发布时间】：2021-09-27 01:43:36
【问题描述】：

错误提示：

AttributeError: 'NoneType' 对象没有属性 'get_text'

我正在关注网络抓取教程，并且在 this url 上一切正常，当我想将其更改为 this url 时，出现了我已经提到的错误。

爬虫功能：

def product_crawler():
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    title = soup.find(id="productTitle").get_text()
    print(title)

我检查了 stackoverflow 上的所有答案，例如将 html.parser 更改为 lxml ，但没有一个有效。

【问题讨论】：

标签： python-3.x web-scraping beautifulsoup attributeerror

【解决方案1】：

尝试添加Accept-Language HTTP 头：

import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.com/dp/B08DK5ZH44"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
    "Accept-Language": "en-US,en;q=0.5",
}

page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
title = soup.find(id="productTitle").get_text(strip=True)
print(title)

打印：

GoPro HERO9 Black - Waterproof Action Camera with Front LCD and Touch Rear Screens, 5K Ultra HD Video, 20MP Photos, 1080p Live Streaming, Webcam, Stabilization

【讨论】：