【发布时间】:2020-01-16 14:17:33
【问题描述】:
这是我目前从网站上抓取特定玩家数据的代码:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
import lxml
import xlsxwriter
page = requests.get('https://www.futbin.com/players?page=1')
soup = BeautifulSoup(page.content, 'lxml')
pool = soup.find(id='repTb')
pnames = pool.find_all(class_='player_name_players_table')
pprice = pool.find_all(class_='ps4_color font-weight-bold')
prating = pool.select('span[class*="form rating ut20"]')
all_player_names = [name.getText() for name in pnames]
all_prices = [price.getText() for price in pprice]
all_pratings = [rating.getText() for rating in prating]
fut_data = pd.DataFrame(
{
'Player': all_player_names,
'Rating': all_pratings,
'Price': all_prices,
})
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
fut_data.to_excel(writer,'Futbin')
writer.save()
print(fut_data)
这对于第一页效果很好。但是我一共需要翻609页,从所有页面中获取数据。
您能帮我重新编写这段代码以使其正常工作吗?我还是新手,正在学习这个项目。
【问题讨论】:
-
将代码放入循环 1 到 600。使用循环索引重新生成 url。并且不要忘记更改文件名
标签: python pandas web-scraping beautifulsoup python-requests