【发布时间】:2019-03-21 02:09:19
【问题描述】:
#Initialization for beautifulsoup to access site for per game stats
url = "https://stats.nba.com/players/traditional/?sort=PTS&dir=-1&Season=2018-19&SeasonType=Regular%20Season"
d = webdriver.Chrome(ChromeDriverManager().install())
d.get(url)
#Initializes data frame to store player data
data_df= pd.DataFrame(columns={'Player','Team','3PA','3P%','3PaTotal','Season'})
for yearCount in range(0,20):
season = [18,19]
seasonStr = str(season[0])+"/"+str(season[1])
for pageCounter in range(0,11):
#Scrapes all of the data putting it into headers
soup = BeautifulSoup(d.page_source, 'html.parser').find('table')
headers, [_, *data] = [i.text for i in soup.find_all('th')], [[i.text for i in b.find_all('td')] for b in soup.find_all('tr')]
final_data = [i for i in data if len(i) > 1]
#Creates a dictionary of headers
data_attrs = [dict(zip(headers, i)) for i in final_data]
#Collects stats that are used for graph
players = [i['PLAYER'] for i in data_attrs]
teams = [i['TEAM'] for i in data_attrs]
threePointAttempts = [i['3PA'] for i in data_attrs]
threePointPercentage = [i['3P%'] for i in data_attrs]
#Adds the data collected to the dataframe
temp_df = pd.DataFrame({'Player': players,
'Team': teams,
'3PA': threePointAttempts,
'3P%': threePointPercentage,
'3PaTotal' : 0,
'Season' : seasonStr})
data_df = data_df.append(temp_df, ignore_index=True)
data_df = data_df[['Player','Team','3PA','3P%','3PaTotal','Season']]
#Goes to next page
nxt = d.find_element_by_class_name("stats-table-pagination__next")
nxt.click()
dropDown = Select(d.find_element_by_name("Season"))
dropDown.select_by_index(yearCount)
我的错误代码:
Traceback(最近一次调用最后一次):文件 "C:/Users/brenn/PycharmProjects/NBAstats/venv/Lib/site-packages/Player 3-Point.py”,第 44 行,在 headers, [_, *data] = [i.text for i in soup.find_all('th')], [[i.text for i in b.find_all('td')] for b in soup.find_all( 'tr')]
AttributeError: 'NoneType' 对象没有属性 'find_all'
我在尝试在 NBA 网站上收集过去几个赛季的数据时遇到问题。我的代码收集了当前赛季的所有球员数据(遍历每一页没有问题)。但是,当我尝试通过浏览下拉列表来收集过去一年的数据时,它不起作用。如果我使用上一季的 URL,而不使用下拉菜单导航,它会毫无问题地收集数据。同样在 selenium chrome 选项卡中,页面切换到过去一年,但在尝试读取数据时遇到问题。
【问题讨论】:
-
显示预期结果样本
标签: python pandas selenium web-scraping beautifulsoup