【问题标题】:Beautifulsoup doesn't work properly with all urlsBeautifulsoup 不适用于所有网址
【发布时间】:2021-09-27 01:43:36
【问题描述】:
错误提示:
AttributeError: 'NoneType' 对象没有属性 'get_text'
我正在关注网络抓取教程,并且在 this url 上一切正常,当我想将其更改为 this url 时,出现了我已经提到的错误。
爬虫功能:
def product_crawler():
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
print(title)
我检查了 stackoverflow 上的所有答案,例如将 html.parser 更改为 lxml ,但没有一个有效。
【问题讨论】:
标签:
python-3.x
web-scraping
beautifulsoup
attributeerror
【解决方案1】:
尝试添加Accept-Language HTTP 头:
import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.com/dp/B08DK5ZH44"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
"Accept-Language": "en-US,en;q=0.5",
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
title = soup.find(id="productTitle").get_text(strip=True)
print(title)
打印:
GoPro HERO9 Black - Waterproof Action Camera with Front LCD and Touch Rear Screens, 5K Ultra HD Video, 20MP Photos, 1080p Live Streaming, Webcam, Stabilization