即使对象存在，漂亮的汤也会出现非类型错误答案

【问题标题】：getting nonetype error with beautiful soup even though the object exists即使对象存在，漂亮的汤也会出现非类型错误
【发布时间】：2021-06-07 10:23:18
【问题描述】：

我正在尝试抓取网页https://www.cars.com/dealers/5374692/carvana-touchless-delivery-to-your-home/

在此页面中有一个按钮可以查看所有车辆，我正在尝试获取该标签的 href。

到目前为止，我已经使用 selenium 完成了这项工作，但每次打开 webdriver 都需要太多时间。我不想尝试硒

BeautifulSoup 显示非类型错误。我的代码是

import requests
from bs4 import BeautifulSoup
import re

base_url = 'https://www.cars.com/'

def request_page(url):
    session = requests.Session()
    my_headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"}
    response = session.get(url, headers=my_headers)
    soup = BeautifulSoup(re.sub("<!---->","", response.text), "lxml")
    return soup

def dealers_subpage(url):
    try:
        soup = request_page(url)
        descript = soup.find('dpp-update-inventory-link')
        print(descript.prettify())
        link = descript.find('a')['href']
        return base_url+str(link)
    except Exception as e:
        print(e,url)


dealers_subpage('https://www.cars.com/dealers/5374692/carvana-touchless-delivery-to-your-home/')

对于这段代码，我收到了这条消息。

<dpp-update-inventory-link new-count="" party-id="74424458" used-count="100" zipcode="11763">
</dpp-update-inventory-link>

'NoneType' object is not subscriptable https://www.cars.com/dealers/5374692/carvana-touchless-delivery-to-your-home/

我的问题是它为什么不读取那里存在的 a 标签。

注意 - 使用隐身/私人模式访问网页，因为在正常窗口中它会重定向到其他页面

【问题讨论】：

没有查看该站点，但如果它与 Selenium 一起使用 - 页面可能使用 JavaScript 并且仅使用 requests 和 BS，它不会呈现您在浏览器中看到的相同页面。
soup.find() 查找带有标签 dpp-update-inventory-link 的元素，而这不是 HTML 元素。此外，在wgetting 索引文件之后，我在任何地方都找不到该字符串。这可能解释了find() 返回的None。
@Jens 您是否在隐身选项卡中打开了网页。如果您在页面检查器中搜索标签，它会弹出，或者检查查看所有车辆按钮

标签： python web-scraping beautifulsoup nonetype

【解决方案1】：

页面正在动态加载，因此您无法在 dpp-update-inventory-link 中获取 a 标记，即使您正在打印 descript.prettify() a 不存在，因此意味着它动态呈现您必须使用 selenium。

仅针对当前链接要求，您可以自行生成该链接，因为该链接的src 使用descript 的属性，例如party-id 和zipcode 所以

def dealers_subpage(url):
   soup = request_page(url)
   descript = soup.find('dpp-update-inventory-link')
   party_id = descript['party-id']
   zipcode = descript['zipcode']
   url  = f"{base_url}/for-sale/searchresults.action/?dlId={party_id}&zc={zipcode}&searchSource=CAPTIVE_BLENDED"
   return url

【讨论】：

虽然这不是我问的，但你的解决方案效果很好。谢谢兄弟