【问题标题】:Webscraping in BeautifulSoup is returning an empty listBeautifulSoup 中的 Web Scraping 返回一个空列表
【发布时间】:2020-10-30 17:16:31
【问题描述】:

我正在尝试从篮球参考中抓取一张表格,它返回一个空列表。我希望有人可以帮助我调试或解释原因。该页面有很多表格,但特别是杂项统计部分。提前致谢!

from bs4 import BeautifulSoup
import requests
import time
import pandas as pd
import matplotlib as plt
import numpy as np

url = 'https://www.basketball-reference.com/leagues/NBA_2020.html#all_misc_stats'
res = requests.get(url)

soup = BeautifulSoup(res.content,'lxml')

soup.find('div', {'id':'div_misc_stats'})

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    您的实现对于解析汤并没有错,只是您要查找的特定元素需要 javascript 来呈现。如果可以的话,您最好寻找其他数据来源。

    如果您真的需要这些数据,那么您可能希望先查看渲染页面(请参阅this 以获得一些灵感)

    从我粗略的分析来看,在渲染数据之前似乎没有进行外部网络调用来获取数据,因此它可能嵌入在页面中的其他位置,如 xml/json/etc,尽管我没有在我的搜索中找到它。如果这不是您需要一次性完成的事情,那么在投资计算成本更高的方法之前可能值得检查一下。

    【讨论】:

      【解决方案2】:

      数据在 HTML 注释 <!-- ... --> 中。您可以使用此脚本将其加载到 DataFrame 中:

      import requests
      import pandas as pd
      from bs4 import BeautifulSoup, Comment
      
      
      url = 'https://www.basketball-reference.com/leagues/NBA_2020.html'
      
      soup = BeautifulSoup(requests.get(url).content, 'html.parser')
      table = soup.select_one('h2:contains("Miscellaneous Stats")').find_next(text=lambda t: isinstance(t, Comment))
      
      df = pd.read_html(str(table))[0].droplevel(0, axis=1)
      print(df)
      

      打印:

            Rk                    Team   Age     W     L  PW  PL    MOV   SOS    SRS   ORtg   DRtg  ...    TS%   eFG%  TOV%  ORB%  FT/FGA   eFG%  TOV%  DRB%  FT/FGA                       Arena  Attend.  Attend./G
      0    1.0        Milwaukee Bucks*  29.2  53.0  12.0  52  13  11.29 -0.85  10.44  112.6  101.9  ...  0.583  0.553  12.8  20.7   0.196  0.486  12.2  81.7   0.172                Fiserv Forum   549036      17711
      1    2.0     Los Angeles Lakers*  29.6  49.0  14.0  45  18   7.41  0.34   7.75  113.0  105.6  ...  0.577  0.548  13.2  24.6   0.196  0.509  13.8  78.4   0.202              STAPLES Center   588907      18997
      2    3.0   Los Angeles Clippers*  27.4  44.0  20.0  44  20   6.52  0.22   6.74  113.6  107.2  ...  0.574  0.532  12.7  24.0   0.232  0.503  12.3  77.3   0.210              STAPLES Center   610176      19068
      3    4.0        Toronto Raptors*  26.6  46.0  18.0  44  20   6.45 -0.57   5.88  111.6  105.2  ...  0.574  0.536  12.8  21.6   0.205  0.502  14.6  76.1   0.200            Scotiabank Arena   633456      19796
      4    5.0        Dallas Mavericks  26.2  40.0  27.0  45  22   6.04 -0.21   5.84  116.7  110.6  ...  0.581  0.548  11.3  23.5   0.198  0.519  10.9  77.4   0.172    American Airlines Center   682096      20062
      5    6.0         Boston Celtics*  25.3  43.0  21.0  44  20   6.17 -0.48   5.69  112.9  106.8  ...  0.567  0.529  12.0  23.9   0.204  0.510  13.6  77.5   0.212                   TD Garden   610864      19090
      6    7.0        Houston Rockets*  29.1  40.0  24.0  39  25   3.75  0.03   3.78  113.8  110.2  ...  0.578  0.539  12.6  22.4   0.226  0.528  13.5  75.6   0.194               Toyota Center   578458      18077
      7    8.0              Utah Jazz*  27.5  41.0  23.0  38  26   3.17  0.03   3.20  112.6  109.4  ...  0.587  0.552  13.6  21.2   0.208  0.514  10.9  79.0   0.180     Vivint Smart Home Arena   567486      18306
      8    9.0         Denver Nuggets*  25.6  43.0  22.0  39  26   2.95  0.06   3.02  112.5  109.5  ...  0.564  0.532  12.3  24.7   0.178  0.526  13.0  77.0   0.194                Pepsi Center   633153      19186
      9   10.0  Oklahoma City Thunder*  25.6  40.0  24.0  37  27   2.45  0.34   2.79  111.6  109.1  ...  0.577  0.534  12.3  19.2   0.233  0.520  12.4  76.8   0.164     Chesapeake Energy Arena   600699      18203
      10  11.0             Miami Heat*  25.9  41.0  24.0  39  26   3.23 -0.65   2.58  112.7  109.4  ...  0.587  0.549  13.5  20.5   0.231  0.522  12.3  79.7   0.208      AmericanAirlines Arena   629771      19680
      11  12.0     Philadelphia 76ers*  26.4  39.0  26.0  37  28   2.22  0.01   2.22  110.4  108.2  ...  0.562  0.530  12.7  23.7   0.189  0.522  12.7  80.4   0.211          Wells Fargo Center   639491      20629
      12  13.0         Indiana Pacers*  25.6  39.0  26.0  37  28   1.94 -0.33   1.61  110.3  108.3  ...  0.565  0.533  11.9  20.3   0.170  0.513  12.8  77.1   0.193     Bankers Life Fieldhouse   529002      16531
      13  14.0    New Orleans Pelicans  25.4  28.0  36.0  30  34  -0.83  1.13   0.30  110.8  111.6  ...  0.567  0.538  13.7  24.3   0.183  0.531  12.3  78.1   0.207        Smoothie King Center   528172      16505
      14  15.0           Orlando Magic  26.0  30.0  35.0  30  35  -0.97  0.12  -0.85  108.0  109.0  ...  0.540  0.503  11.4  22.4   0.191  0.535  13.5  79.0   0.170                Amway Center   529870      17093
      15  16.0       Memphis Grizzlies  24.0  32.0  33.0  30  35  -1.08  0.02  -1.05  109.4  110.4  ...  0.561  0.530  13.2  23.2   0.178  0.520  12.6  77.6   0.213                 FedEx Forum   523297      15857
      16  17.0            Phoenix Suns  24.7  26.0  39.0  30  35  -1.37  0.32  -1.05  110.5  111.8  ...  0.572  0.528  13.3  22.2   0.226  0.543  14.0  78.3   0.221  Talking Stick Resort Arena   550633      15606
      17  18.0  Portland Trail Blazers  27.5  29.0  37.0  30  36  -1.61  0.49  -1.11  112.5  114.1  ...  0.566  0.530  11.5  22.0   0.191  0.523  11.0  75.0   0.204                 Moda Center   628303      19634
      18  19.0           Brooklyn Nets  26.5  30.0  34.0  31  33  -0.64 -0.54  -1.18  108.1  108.7  ...  0.550  0.515  13.4  23.5   0.199  0.507  10.9  77.8   0.181             Barclays Center   524907      16403
      19  20.0       San Antonio Spurs  27.9  27.0  36.0  28  35  -1.76  0.57  -1.21  111.9  113.7  ...  0.569  0.529  11.0  19.5   0.206  0.542  11.5  79.2   0.194                 AT&T Center   550515      18351
      20  21.0        Sacramento Kings  27.1  28.0  36.0  28  36  -1.92  0.48  -1.44  109.7  111.6  ...  0.563  0.531  13.0  21.8   0.178  0.540  13.6  78.5   0.222             Golden 1 Center   520663      16796
      21  22.0  Minnesota Timberwolves  24.8  19.0  45.0  24  40  -4.30  0.51  -3.78  108.1  112.2  ...  0.551  0.514  13.0  22.1   0.209  0.541  13.2  77.2   0.218               Target Center   482112      15066
      22  23.0           Chicago Bulls  24.4  22.0  43.0  26  39  -3.08 -0.73  -3.81  106.7  109.8  ...  0.547  0.515  13.7  22.8   0.175  0.546  16.3  75.6   0.239               United Center   639352      18804
      23  24.0         Detroit Pistons  25.9  20.0  46.0  26  40  -3.56 -0.66  -4.22  109.0  112.7  ...  0.561  0.529  13.8  22.6   0.194  0.541  12.7  75.9   0.186        Little Caesars Arena   509469      15294
      24  25.0      Washington Wizards  25.4  24.0  40.0  24  40  -4.05 -0.81  -4.86  111.9  115.8  ...  0.568  0.528  12.1  22.0   0.214  0.560  14.0  74.9   0.230           Capital One Arena   532702      16647
      25  26.0         New York Knicks  24.5  21.0  45.0  20  46  -6.45 -0.09  -6.55  106.5  113.0  ...  0.531  0.501  12.6  25.8   0.182  0.541  12.4  78.3   0.224  Madison Square Garden (IV)   620789      18812
      26  27.0       Charlotte Hornets  24.3  23.0  42.0  19  46  -6.75 -0.12  -6.88  106.3  113.3  ...  0.539  0.504  13.3  23.9   0.188  0.546  13.1  74.4   0.159             Spectrum Center   478591      15428
      27  28.0     Cleveland Cavaliers  25.0  19.0  46.0  18  47  -7.89  0.33  -7.55  107.5  115.4  ...  0.553  0.522  14.6  24.6   0.172  0.560  11.7  77.4   0.164         Quicken Loans Arena   643008      17861
      28  29.0           Atlanta Hawks  24.1  20.0  47.0  18  49  -7.97  0.40  -7.57  107.2  114.8  ...  0.554  0.515  13.8  21.6   0.204  0.543  12.7  74.9   0.233            State Farm Arena   545453      16043
      29  30.0   Golden State Warriors  24.4  15.0  50.0  16  49  -8.71  0.79  -7.92  105.2  113.8  ...  0.540  0.497  13.2  21.5   0.212  0.553  13.7  76.4   0.193                Chase Center   614176      18064
      30   NaN          League Average  26.2   NaN   NaN  32  32   0.00  0.00   0.00  110.4  110.4  ...  0.564  0.528  12.8  22.6   0.199  0.528  12.8  77.4   0.199                         NaN   575820      17788
      
      [31 rows x 28 columns]
      

      【讨论】:

        【解决方案3】:

        您要抓取的这个网站,是一个动态网站,因此您无法在第一次请求该网站时访问所有数据,您需要等待几秒钟以渲染javascript然后访问对于所有网站数据,对于此解决方案,您可以使用selenium。阅读文档并下载驱动程序 chrome 或 firefox 然后使用它,我编写了您可以访问该表的代码:

        from selenium import webdriver
        
        import pandas as pd
        import os
        import time
        
        
        chromedriver = "driver/chromedriver"
        os.environ["webdriver.chrome.driver"] = chromedriver
        driver = webdriver.Chrome(chromedriver)
        
        url = 'https://www.basketball-reference.com/leagues/NBA_2020.html#all_misc_stats'
        
        driver.get(url)
        time.sleep(15)
        
        soruce = driver.page_source
        
        tables = pd.read_html(soruce)
        
        for table in tables:
            try:
                if 'Arena' in table.columns[25][1]:
                    print(table)
            except:
                pass
        

        打印:

                           Rk                    Team                Age  ...                       Arena             Attend.           Attend./G
        0                 1.0        Milwaukee Bucks*               29.2  ...                Fiserv Forum              549036               17711
        1                 2.0     Los Angeles Lakers*               29.6  ...              STAPLES Center              588907               18997
        2                 3.0   Los Angeles Clippers*               27.4  ...              STAPLES Center              610176               19068
        3                 4.0        Toronto Raptors*               26.6  ...            Scotiabank Arena              633456               19796
        4                 5.0        Dallas Mavericks               26.2  ...    American Airlines Center              682096               20062
        5                 6.0         Boston Celtics*               25.3  ...                   TD Garden              610864               19090
        6                 7.0        Houston Rockets*               29.1  ...               Toyota Center              578458               18077
        7                 8.0              Utah Jazz*               27.5  ...     Vivint Smart Home Arena              567486               18306
        8                 9.0         Denver Nuggets*               25.6  ...                Pepsi Center              633153               19186
        9                10.0  Oklahoma City Thunder*               25.6  ...     Chesapeake Energy Arena              600699               18203
        10               11.0             Miami Heat*               25.9  ...      AmericanAirlines Arena              629771               19680
        11               12.0     Philadelphia 76ers*               26.4  ...          Wells Fargo Center              639491               20629
        12               13.0         Indiana Pacers*               25.6  ...     Bankers Life Fieldhouse              529002               16531
        13               14.0    New Orleans Pelicans               25.4  ...        Smoothie King Center              528172               16505
        14               15.0           Orlando Magic               26.0  ...                Amway Center              529870               17093
        15               16.0       Memphis Grizzlies               24.0  ...                 FedEx Forum              523297               15857
        16               17.0            Phoenix Suns               24.7  ...  Talking Stick Resort Arena              550633               15606
        17               18.0  Portland Trail Blazers               27.5  ...                 Moda Center              628303               19634
        18               19.0           Brooklyn Nets               26.5  ...             Barclays Center              524907               16403
        19               20.0       San Antonio Spurs               27.9  ...                 AT&T Center              550515               18351
        20               21.0        Sacramento Kings               27.1  ...             Golden 1 Center              520663               16796
        21               22.0  Minnesota Timberwolves               24.8  ...               Target Center              482112               15066
        22               23.0           Chicago Bulls               24.4  ...               United Center              639352               18804
        23               24.0         Detroit Pistons               25.9  ...        Little Caesars Arena              509469               15294
        24               25.0      Washington Wizards               25.4  ...           Capital One Arena              532702               16647
        25               26.0         New York Knicks               24.5  ...  Madison Square Garden (IV)              620789               18812
        26               27.0       Charlotte Hornets               24.3  ...             Spectrum Center              478591               15428
        27               28.0     Cleveland Cavaliers               25.0  ...         Quicken Loans Arena              643008               17861
        28               29.0           Atlanta Hawks               24.1  ...            State Farm Arena              545453               16043
        29               30.0   Golden State Warriors               24.4  ...                Chase Center              614176               18064
        30                NaN          League Average               26.2  ...                         NaN              575820               17788
        
        [31 rows x 28 columns]
        

        【讨论】:

          猜你喜欢
          • 2021-01-25
          • 2020-01-09
          • 1970-01-01
          • 1970-01-01
          • 2021-10-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2021-09-03
          相关资源
          最近更新 更多