【发布时间】:2026-01-13 02:40:01
【问题描述】:
尝试捕获项目符号中存在的数据
链接https://www.redbook.com.au/cars/details/2019-honda-civic-50-years-edition-auto-my19/SPOT-ITM-524208/
这里需要使用xpath提取数据
要提取的数据
4 Door Sedan
4 Cylinder, 1.8 Litre
Constantly Variable Transmission, Front Wheel Drive
Petrol - Unleaded ULP
6.4 L/100km
试过这个:
import requests
import lxml.html as lh
import pandas as pd
import html
from lxml import html
from bs4 import BeautifulSoup
import requests
cars = []
urls = ['https://www.redbook.com.au/cars/details/2019-honda-civic-50-years-edition-auto-my19/SPOT-ITM-524208/']
for url in urls:
car_data={}
headers = {'User-Agent':'Mozilla/5.0'}
page = (requests.get(url, headers=headers))
tree = html.fromstring(page.content)
if tree.xpath('/html/body/div[1]/div[2]/div/div[1]/div[1]/div[4]/div/div'):
car_data["namings"] = tree.xpath('/html/body/div[1]/div[2]/div/div[1]/div[1]/div[4]/div/div')[0]
【问题讨论】:
标签: python beautifulsoup request python-requests