【问题标题】:BeautifulSoup: getting child of div containerBeautifulSoup:获取 div 容器的孩子
【发布时间】:2021-11-27 08:37:44
【问题描述】:

我正在尝试获取汽车销售网站上列出的汽车的价格和里程表读数,以监控特定型号何时列出以及何时消失。 一个页面可能会返回 1 辆或多辆汽车。我对 python 和 BeautifulSoup 都是新手,而且很可能咬得比我能嚼的多。

我设法请求了该页面,并找到了 div 容器,每个容器都包含一辆车的详细信息。

我可以遍历汽车列表,但无法寻址/提取每辆汽车的后续标签。

# import libraries
from bs4 import BeautifulSoup
import requests
# Request to website and download HTML contents
url = 'https://www.carsales.com.au/cars/2011/mercedes-benz/s-class/s350-badge/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}

response = requests.get(url, headers=headers)
response_code = response.status_code

if response_code != 200:
    print(f"Error fetching page: {response_code}")
    exit()
else:
    content = response.content

soup = BeautifulSoup(content, 'html.parser')

# <div class="card-body">
SELECTOR_CAR = "card-body"

# <a class="js-encode-search" data-webm-clickvalue="sv-price" href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">$40,990* <span class="currency"></span></a>
SELECTOR_PRICE = ""

# <ul class="key-details">
#   <li class="key-details__value" data-type="Odometer">95,121 km</li>
SELECTOR_ODO = ""

# find all cars on page
# class is a python reserved work; use class_ instead
cars = soup.find_all(class_ = SELECTOR_CAR)

# ----- my original version
formatted_cars = []     # array for car details

for car in cars:
    print("==========")
    data = {
        'title': car('js-encode-search'),
        'price': car('key-details__value')
    }
    formatted_cars.append(data)
    #car_soup = BeautifulSoup(car, 'html.parser')
    #print(car_card.prettify)
    #print(car_card)

print(formatted_cars)
# ----- end original

# ----- modified later
for car in cars:
    print("==========")
    for child in car.a.children:
        print(child)

    car_odo = car.li.contents
    print(car_odo)
# ----- modified later end

结果[来自'for'的修改版本]:

python3 getCarsales_S350.py 
9 Mercedes-Benz S-Class S350 cars for sale in Australia
9
==========
2009 Mercedes-Benz S-Class S350 Auto MY08
['181,150 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['291,153 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['192,851 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['78,606 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['38,806 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['172,012 km']
==========
2010 Mercedes-Benz S-Class S350 L Auto MY10
['77,800 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['143,000 km']
==========
2011 Mercedes-Benz S-Class S350 Auto MY10
['95,121 km']

...这是偶然的,而不是具体的,证明无法获得价格。 Odo 和标题恰好是第一个元素。

这里是一个汽车集装箱:

<div class="card-body">
    <div class="row">
        <div class="col">
            <h3>
                <a class="js-encode-search" data-webm-clickvalue="sv-title"
                    href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">2011
                    Mercedes-Benz S-Class S350 Auto MY10</a>
            </h3>
        </div>
        <div class="col-12 col-xl-5 text-right">
            <div class="item-price">
                <div class="price">
                    <a class="js-encode-search" data-webm-clickvalue="sv-price"
                        href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">$40,990*
                        <span class="currency"></span></a>
                </div>
                <div class="price-info-container">
                    <a class="price-info" data-target-url="/_details/api/v1/price-guide/carsales/OAG-AD-19752647"
                        data-toggle="lightbox" data-webm-clickvalue="sv-price-label">
                        Excl. Govt. Charges
                    </a>
                    <a class="additional-price-info"
                        data-target-url="/_details/api/v1/price-guide/carsales/OAG-AD-19752647"
                        data-toggle="lightbox"></a>
                </div>
            </div>
        </div>
    </div>
    <div class="row">
        <div class="col">
            <ul class="key-details">
                <li class="key-details__value" data-type="Odometer">95,121 km</li>
                <li class="key-details__value" data-type="Body Style">Sedan</li>
                <li class="key-details__value" data-type="Transmission">Automatic</li>
                <li class="key-details__value" data-type="Engine">6cyl 3.5L Petrol</li>
            </ul>
            <a class="xfacts-report" data-lightbox-height="650" data-lightbox-onclosed="onFactsPlusModalClosed"
                data-lightbox-width="900" data-opm-event="click-facts-driver-listings"
                data-opm-exp="facts-driver-listings" data-opm-trackon="click" data-seller-type="dealer"
                data-smart-buyer-network-id="OAG-AD-19752647"
                data-target-url="/smartbuyer/popup?networkId=OAG-AD-19752647&amp;sourcesystem=desktop.carsales-dealer.listing-carfacts.buy.textlink&amp;driver_crosssell=desktop.carsales-dealer.listing-carfacts.buy.textlink"
                data-toggle="lightbox" data-webm-clickvalue="get-carfacts-report">
                Pricing &amp; history on this car - FACTS+
            </a>
        </div>
        <div class="col-12 col-xl-4 text-right d-flex align-items-start badge-csn">
        </div>
    </div>
</div>

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    所选答案对一辆车是正确的。 要获取所有汽车,for 循环需要如下所示:

            formatted_cars = []     # array for car details
    
            for car in cars:
                print("==========")
                data = {
                    'title': ' '.join(car.select_one('h3 a').get_text(strip=True).split()),
                    'price': car.select_one('div.price a').get_text(strip=True),
                    'odo': car.select_one('ul.key-details li').get_text(strip=True)
                }
                #print(data)
                formatted_cars.append(data)
    
            print(formatted_cars)
    
    

    汤参考是汽车而不是汤。 (希望这是有道理的)

    【讨论】:

      【解决方案2】:

      会发生什么

      有多个标签包含classjs-encode-search,您尝试其中的find_all()

      如何解决

      使您的选择器更具体,因为标题放置在父 &lt;h3&gt;&lt;a&gt;

      soup.select_one('h3 a')
      

      示例

      soup = BeautifulSoup(content, 'html.parser')
      
      formatted_cars = []     # array for car details
      
      for car in cars:
          print("==========")
          data = {
              'title': ' '.join(soup.select_one('h3 a').get_text(strip=True).split()),
              'price': soup.select_one('div.price a').get_text(strip=True)
          }
          formatted_cars.append(data)
      
      print(formatted_cars)
      

      输出

      ==========
      [{'title': '2011 Mercedes-Benz S-Class S350 Auto MY10', 'price': '$40,990*'}]
      

      【讨论】:

      • 不错;工作,谢谢...对不起,我在进一步试验时更改了代码;将更新我的帖子以显示这两种尝试。
      • 只是想知道 - 但很高兴你更新了你的问题并表现出努力 - 很高兴为您提供帮助
      • 我的万岁,我要禁食。这适用于一辆车,如 HTML 示例所示。但实际页面上有 9 辆车,我想要每辆车的数据。 for 循环成功地循环通过
        部分(每辆车一个)。我尝试在 foo 循环中使用 car_soup = BeautifulSoup(car, 'html.parser') 但出现错误。希望我可以为它提取相关标签。
      猜你喜欢
      • 2014-01-20
      • 2021-04-15
      • 2016-04-15
      • 2018-10-21
      • 2014-10-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多