【发布时间】:2022-01-16 20:42:21
【问题描述】:
此代码在我运行时不会崩溃。输出文件 flyingmag.csv 已填充,但不是我想要的。我想添加 div class="elementor-widget-container" > h2 和 div class="elementor-widget-container" > h3 以便飞机制造商和飞机模型包含在输出中。 我真的希望记录采用传统的 excel row 格式,以及刮所有飞机制造商和型号
import requests, csv
from bs4 import BeautifulSoup
from urllib.request import Request
url = 'https://www.flyingmag.com/2019-buyers-single-engine-piston/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
with open('flyingmag.csv', "w", encoding="utf-8-sig") as f:
writer = csv.writer(f)
writer.writerow(['Base_Price','Typically_Equipped_Price','Engine','Horsepower','Propeller','Seats','Length','Height','Wingspan','Wing_Area','Wing_Loading','Power_Loading','Max_Takeoff_Weight','Empty_Weight','Useful_Load','Fuel_Capacity','Max_Operating_Altitude','Max_Rate_of_Climb','Max_Cruise_Speed','Normal_Cruise_Speed','Never_Exceed_Speed','Stall_Speed-Flaps_Up','Stall_Speed-Landing_Configuration','Max_Range','Takeoff_Roll','Takeoff_Distance_Over_50_ft.','Landing_Roll','Landing_Distance_Over_50_ft'])
while True:
html = requests.get(url , headers = headers)
soup = BeautifulSoup(html.text, 'html.parser')
for row in soup.select('table tbody tr'):
writer.writerow([c.text if c.text else '' for c in row.select('td')])
print(row)
else:
break
【问题讨论】:
-
停止使用标签发送垃圾邮件。您的代码中没有
pandas,页面没有分页。
标签: python csv web-scraping beautifulsoup