【问题标题】:AttributeError: 'NoneType' object has no attribute 'text' - BeautifulShopAttributeError:“NoneType”对象没有属性“文本”-BeautifulShop
【发布时间】:2021-06-05 14:44:55
【问题描述】:

我有一些代码用于从 fbref 获取 scraping 信息(数据链接:https://fbref.com/en/comps/9/stats/Premier-League-Stats),它运行良好,但现在我在某些功能上遇到了一些问题(我检查了这些字段现在不起作用的是“player”、“nationality”、“position”、“squad”、“age”、“birth_year”)。我还检查了这些字段在网络中的名称是否与以前相同。有什么想法/帮助解决问题吗?

非常感谢!


import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
import sys, getopt
import csv

def get_tables(url):
    res = requests.get(url)
    ## The next two lines get around the issue with comments breaking the parsing.
    comm = re.compile("<!--|-->")
    soup = BeautifulSoup(comm.sub("",res.text),'lxml')
    all_tables = soup.findAll("tbody")
    team_table = all_tables[0]
    player_table = all_tables[1]
    return player_table, team_table

def get_frame(features, player_table):
    pre_df_player = dict()
    features_wanted_player = features
    rows_player = player_table.find_all('tr')
    for row in rows_player:
        if(row.find('th',{"scope":"row"}) != None):
    
            for f in features_wanted_player:
                cell = row.find("td",{"data-stat": f})
                a = cell.text.strip().encode()
                text=a.decode("utf-8")
                if(text == ''):
                    text = '0'
                if((f!='player')&(f!='nationality')&(f!='position')&(f!='squad')&(f!='age')&(f!='birth_year')):
                    text = float(text.replace(',',''))
                if f in pre_df_player:
                    pre_df_player[f].append(text)
                else:
                    pre_df_player[f] = [text]
    df_player = pd.DataFrame.from_dict(pre_df_player)
    return df_player

stats = ["player","nationality","position","squad","age","birth_year","games","games_starts","minutes","goals","assists","pens_made","pens_att","cards_yellow","cards_red","goals_per90","assists_per90","goals_assists_per90","goals_pens_per90","goals_assists_pens_per90","xg","npxg","xa","xg_per90","xa_per90","xg_xa_per90","npxg_per90","npxg_xa_per90"]

def frame_for_category(category,top,end,features):
    url = (top + category + end)
    player_table, team_table = get_tables(url)
    df_player = get_frame(features, player_table)
    return df_player

top='https://fbref.com/en/comps/9/'
end='/Premier-League-Stats'
df1 = frame_for_category('stats',top,end,stats)

df1

【问题讨论】:

  • 对不起,我没有写所有的代码。我现在已经更新了。谢谢!
  • 你能不能更具体一点(告诉我们错误,告诉我们哪一行不起作用......)
  • 仅供参考,正确的术语是“刮擦”。报废意味着像垃圾一样扔掉。

标签: python web-scraping beautifulsoup nonetype


【解决方案1】:

如果您只关注球员统计数据,请将player_table = all_tables[1] 更改为player_table = all_tables[2],因为现在您将团队表输入到get_frame 函数中。

我试过了,之后效果很好。

【讨论】:

  • 非常感谢!它终于适应了这种变化。对于期货情况,该数字指的是什么?
  • 好吧,这个all_tables = soup.findAll("tbody") 为您提供了找到的“tbody”元素的列表,因此本例中的数字只是对特定“tbody”元素的引用。
【解决方案2】:

我建议用熊猫的read_html 加载表格。在 Share & Export --> Embed this Table 下有一个指向此表的直接链接。

import pandas as pd
df = pd.read_html("https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F9%2Fstats%2FPremier-League-Stats&div=div_stats_standard", header=1)

这会输出一个数据帧列表,该表可以通过df[0] 访问。输出df[0].head()

Rk Player Nation Pos Squad Age Born MP Starts Min 90s Gls Ast G-PK PK PKatt CrdY CrdR Gls.1 Ast.1 G+A G-PK.1 G+A-PK xG npxG xA npxG+xA xG.1 xA.1 xG+xA npxG.1 npxG+xA.1 Matches
0 1 Patrick van Aanholt nl NED DF Crystal Palace 30-190 1990 16 15 1324 14.7 0 1 0 0 0 1 0 0 0.07 0.07 0 0.07 1.2 1.2 0.8 2 0.08 0.05 0.13 0.08 0.13 Matches
1 2 Tammy Abraham eng ENG FW Chelsea 23-156 1997 20 12 1021 11.3 6 1 6 0 0 0 0 0.53 0.09 0.62 0.53 0.62 5.6 5.6 0.9 6.5 0.49 0.08 0.57 0.49 0.57 Matches
2 3 Che Adams eng ENG FW Southampton 24-237 1996 26 22 1985 22.1 5 4 5 0 0 1 0 0.23 0.18 0.41 0.23 0.41 5.5 5.5 4.3 9.9 0.25 0.2 0.45 0.25 0.45 Matches
3 4 Tosin Adarabioyo eng ENG DF Fulham 23-164 1997 23 23 2070 23 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0.1 1.1 0.04 0.01 0.05 0.04 0.05 Matches
4 5 Adrián es ESP GK Liverpool 34-063 1987 3 3 270 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Matches

【讨论】:

  • 好点!非常感谢您的快速回答
猜你喜欢
  • 1970-01-01
  • 2018-11-12
  • 2019-02-09
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多