【发布时间】:2021-06-05 14:44:55
【问题描述】:
我有一些代码用于从 fbref 获取 scraping 信息(数据链接:https://fbref.com/en/comps/9/stats/Premier-League-Stats),它运行良好,但现在我在某些功能上遇到了一些问题(我检查了这些字段现在不起作用的是“player”、“nationality”、“position”、“squad”、“age”、“birth_year”)。我还检查了这些字段在网络中的名称是否与以前相同。有什么想法/帮助解决问题吗?
非常感谢!
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
import sys, getopt
import csv
def get_tables(url):
res = requests.get(url)
## The next two lines get around the issue with comments breaking the parsing.
comm = re.compile("<!--|-->")
soup = BeautifulSoup(comm.sub("",res.text),'lxml')
all_tables = soup.findAll("tbody")
team_table = all_tables[0]
player_table = all_tables[1]
return player_table, team_table
def get_frame(features, player_table):
pre_df_player = dict()
features_wanted_player = features
rows_player = player_table.find_all('tr')
for row in rows_player:
if(row.find('th',{"scope":"row"}) != None):
for f in features_wanted_player:
cell = row.find("td",{"data-stat": f})
a = cell.text.strip().encode()
text=a.decode("utf-8")
if(text == ''):
text = '0'
if((f!='player')&(f!='nationality')&(f!='position')&(f!='squad')&(f!='age')&(f!='birth_year')):
text = float(text.replace(',',''))
if f in pre_df_player:
pre_df_player[f].append(text)
else:
pre_df_player[f] = [text]
df_player = pd.DataFrame.from_dict(pre_df_player)
return df_player
stats = ["player","nationality","position","squad","age","birth_year","games","games_starts","minutes","goals","assists","pens_made","pens_att","cards_yellow","cards_red","goals_per90","assists_per90","goals_assists_per90","goals_pens_per90","goals_assists_pens_per90","xg","npxg","xa","xg_per90","xa_per90","xg_xa_per90","npxg_per90","npxg_xa_per90"]
def frame_for_category(category,top,end,features):
url = (top + category + end)
player_table, team_table = get_tables(url)
df_player = get_frame(features, player_table)
return df_player
top='https://fbref.com/en/comps/9/'
end='/Premier-League-Stats'
df1 = frame_for_category('stats',top,end,stats)
df1
【问题讨论】:
-
对不起,我没有写所有的代码。我现在已经更新了。谢谢!
-
你能不能更具体一点(告诉我们错误,告诉我们哪一行不起作用......)
-
仅供参考,正确的术语是“刮擦”。报废意味着像垃圾一样扔掉。
标签: python web-scraping beautifulsoup nonetype