【发布时间】:2021-11-27 08:37:44
【问题描述】:
我正在尝试获取汽车销售网站上列出的汽车的价格和里程表读数,以监控特定型号何时列出以及何时消失。 一个页面可能会返回 1 辆或多辆汽车。我对 python 和 BeautifulSoup 都是新手,而且很可能咬得比我能嚼的多。
我设法请求了该页面,并找到了 div 容器,每个容器都包含一辆车的详细信息。
我可以遍历汽车列表,但无法寻址/提取每辆汽车的后续标签。
# import libraries
from bs4 import BeautifulSoup
import requests
# Request to website and download HTML contents
url = 'https://www.carsales.com.au/cars/2011/mercedes-benz/s-class/s350-badge/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
response = requests.get(url, headers=headers)
response_code = response.status_code
if response_code != 200:
print(f"Error fetching page: {response_code}")
exit()
else:
content = response.content
soup = BeautifulSoup(content, 'html.parser')
# <div class="card-body">
SELECTOR_CAR = "card-body"
# <a class="js-encode-search" data-webm-clickvalue="sv-price" href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">$40,990* <span class="currency"></span></a>
SELECTOR_PRICE = ""
# <ul class="key-details">
# <li class="key-details__value" data-type="Odometer">95,121 km</li>
SELECTOR_ODO = ""
# find all cars on page
# class is a python reserved work; use class_ instead
cars = soup.find_all(class_ = SELECTOR_CAR)
# ----- my original version
formatted_cars = [] # array for car details
for car in cars:
print("==========")
data = {
'title': car('js-encode-search'),
'price': car('key-details__value')
}
formatted_cars.append(data)
#car_soup = BeautifulSoup(car, 'html.parser')
#print(car_card.prettify)
#print(car_card)
print(formatted_cars)
# ----- end original
# ----- modified later
for car in cars:
print("==========")
for child in car.a.children:
print(child)
car_odo = car.li.contents
print(car_odo)
# ----- modified later end
结果[来自'for'的修改版本]:
python3 getCarsales_S350.py
9 Mercedes-Benz S-Class S350 cars for sale in Australia
9
==========
2009 Mercedes-Benz S-Class S350 Auto MY08
['181,150 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['291,153 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['192,851 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['78,606 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['38,806 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['172,012 km']
==========
2010 Mercedes-Benz S-Class S350 L Auto MY10
['77,800 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['143,000 km']
==========
2011 Mercedes-Benz S-Class S350 Auto MY10
['95,121 km']
...这是偶然的,而不是具体的,证明无法获得价格。 Odo 和标题恰好是第一个元素。
这里是一个汽车集装箱:
<div class="card-body">
<div class="row">
<div class="col">
<h3>
<a class="js-encode-search" data-webm-clickvalue="sv-title"
href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">2011
Mercedes-Benz S-Class S350 Auto MY10</a>
</h3>
</div>
<div class="col-12 col-xl-5 text-right">
<div class="item-price">
<div class="price">
<a class="js-encode-search" data-webm-clickvalue="sv-price"
href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">$40,990*
<span class="currency"></span></a>
</div>
<div class="price-info-container">
<a class="price-info" data-target-url="/_details/api/v1/price-guide/carsales/OAG-AD-19752647"
data-toggle="lightbox" data-webm-clickvalue="sv-price-label">
Excl. Govt. Charges
</a>
<a class="additional-price-info"
data-target-url="/_details/api/v1/price-guide/carsales/OAG-AD-19752647"
data-toggle="lightbox"></a>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col">
<ul class="key-details">
<li class="key-details__value" data-type="Odometer">95,121 km</li>
<li class="key-details__value" data-type="Body Style">Sedan</li>
<li class="key-details__value" data-type="Transmission">Automatic</li>
<li class="key-details__value" data-type="Engine">6cyl 3.5L Petrol</li>
</ul>
<a class="xfacts-report" data-lightbox-height="650" data-lightbox-onclosed="onFactsPlusModalClosed"
data-lightbox-width="900" data-opm-event="click-facts-driver-listings"
data-opm-exp="facts-driver-listings" data-opm-trackon="click" data-seller-type="dealer"
data-smart-buyer-network-id="OAG-AD-19752647"
data-target-url="/smartbuyer/popup?networkId=OAG-AD-19752647&sourcesystem=desktop.carsales-dealer.listing-carfacts.buy.textlink&driver_crosssell=desktop.carsales-dealer.listing-carfacts.buy.textlink"
data-toggle="lightbox" data-webm-clickvalue="get-carfacts-report">
Pricing & history on this car - FACTS+
</a>
</div>
<div class="col-12 col-xl-4 text-right d-flex align-items-start badge-csn">
</div>
</div>
</div>
【问题讨论】:
标签: python web-scraping beautifulsoup