【问题标题】:Beautifulsoup doesn't work properly with all urlsBeautifulsoup 不适用于所有网址
【发布时间】:2021-09-27 01:43:36
【问题描述】:

错误提示:

AttributeError: 'NoneType' 对象没有属性 'get_text'

我正在关注网络抓取教程,并且在 this url 上一切正常,当我想将其更改为 this url 时,出现了我已经提到的错误。

爬虫功能:

def product_crawler():
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    title = soup.find(id="productTitle").get_text()
    print(title)

我检查了 stackoverflow 上的所有答案,例如将 html.parser 更改为 lxml ,但没有一个有效。

【问题讨论】:

    标签: python-3.x web-scraping beautifulsoup attributeerror


    【解决方案1】:

    尝试添加Accept-Language HTTP 头:

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.amazon.com/dp/B08DK5ZH44"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
        "Accept-Language": "en-US,en;q=0.5",
    }
    
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    title = soup.find(id="productTitle").get_text(strip=True)
    print(title)
    

    打印:

    GoPro HERO9 Black - Waterproof Action Camera with Front LCD and Touch Rear Screens, 5K Ultra HD Video, 20MP Photos, 1080p Live Streaming, Webcam, Stabilization
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-11-30
      • 2013-11-27
      • 2014-01-04
      • 2013-12-08
      • 2013-10-07
      • 1970-01-01
      相关资源
      最近更新 更多