【发布时间】:2018-09-02 19:01:26
【问题描述】:
我收到一个断言错误,说通过了 20 列,但传递的数据有 50 列。我有点知道是什么导致了这个错误,但是已经晚了,我不确定如何解决它——问题是确实有 20 个列标题,但 50 个数字来自行数。我认为它也可能与循环有关,但任何帮助都将不胜感激,因为我认为这很简单,但我不太确定如何解决它。
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time
playerData = []
for i in range(6):
initialURL = 'https://www.fangraphs.com/leaders.aspx?pos=all&stats=sta&lg=all&qual=0&type=8&season=2017&month=0&season1=2017&ind=0&team=0&rost=0&age=0&filter=&players=0&sort=7,d&page=' + str(i) +'_50'
r = requests.get(initialURL)
soup = BeautifulSoup(r.text, 'html.parser')
statistics = soup.find("table", {"class" : "rgMasterTable"})
statistics.findAll('th')
column_headers = [th.getText() for th in soup.findAll('th')]
data = statistics.findAll('tr')[3:]
pitcherStatistics = [[td.text.strip() for td in data[a].findAll('td')]
for a in range(len(data))]
playerData.append(pitcherStatistics)
print(playerData)
df = pd.DataFrame(playerData, columns=column_headers)
df.to_csv("Starting Pitchers.csv", index=False)
【问题讨论】:
标签: python python-3.x pandas for-loop web-scraping