【发布时间】:2021-11-15 18:45:20
【问题描述】:
我只是在学习网络scraping & 想把这个网站的结果输出到一个csv文件https://www.avbuyer.com/aircraft/private-jets
但我正在努力解析下一页
这是我的代码(在 Amen Aziz 的帮助下),它只给了我第一页
我正在使用 Chrome,所以不确定它是否有任何区别
我正在运行 Python 3.8.12
提前谢谢你
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers= {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.avbuyer.com/aircraft/private-jets')
soup = BeautifulSoup(response.content, 'html.parser')
postings = soup.find_all('div', class_ = 'listing-item premium')
temp=[]
for post in postings:
link = post.find('a', class_ = 'more-info').get('href')
link_full = 'https://www.avbuyer.com'+ link
plane = post.find('h2', class_ = 'item-title').text
price = post.find('div', class_ = 'price').text
location = post.find('div', class_ = 'list-item-location').text
desc = post.find('div', class_ = 'list-item-para').text
try:
tag = post.find('div', class_ = 'list-viewing-date').text
except:
tag = 'N/A'
updated = post.find('div', class_ = 'list-update').text
t=post.find_all('div',class_='list-other-dtl')
for i in t:
data=[tup.text for tup in i.find_all('li')]
years=data[0]
s=data[1]
total_time=data[2]
temp.append([plane,price,location,years,s,total_time,desc,tag,updated,link_full])
df=pd.DataFrame(temp,columns=["plane","price","location","Year","S/N","Totaltime","Description","Tag","Last Updated","link"])
next_page = soup.find('a', {'rel':'next'}).get('href')
next_page_full = 'https://www.avbuyer.com'+next_page
next_page_full
url = next_page_full
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
df.to_csv('/Users/xxx/avbuyer.csv')
【问题讨论】:
标签: python web-scraping beautifulsoup