BeautifulSoup find_all by table 和 id class 返回没有结果？答案

【问题标题】：BeautifulSoup find_all by table and id class returning no results?BeautifulSoup find_all by table 和 id class 返回没有结果？
【发布时间】：2019-12-29 11:56:19
【问题描述】：

我正在尝试从 ProFootball 参考资料中抓取得分数据。在遇到 javascript 问题后，我转向 selenium 来获取初始的汤对象。我试图在网站上找到一个特定的表，然后遍历它的行。

代码字如果我只是 find_all('table')[#] 但是 # 会根据我正在查看的盒子分数而变化，因此它不可靠。因此，我想使用 id='player_offense' 标签来识别跨游戏的同一张桌子，但是当我使用它时，它什么也不返回。我在这里错过了什么？

from selenium import webdriver
import os
from bs4 import BeautifulSoup 

#path to chromedriver
chrome_path=os.path.expanduser('~/Documents/chromedriver.exe') 
driver = webdriver.Chrome(path)

driver.get('https://www.pro-football- 
reference.com/boxscores/201709070nwe.htm')
soup = BeautifulSoup(driver.page_source,'lxml')
driver.quit()

#doesn't work
soup.find('table',id='player_offense')
#works
table = soup.find_all('table')[3]

【问题讨论】：

soup.table 是不是一个有效的属性？
你试过soup.find('table', id='player_offense')吗？
soup.find('table', id='player_offense') 而不是soup.table.find('table',id='player_offense') 错误
嗨，抱歉，我正在搜索soup.find('table',id='player_offense')。我一定是改了代码，忘了改回来。修改问题=

标签： python pandas selenium beautifulsoup

【解决方案1】：

数据以 cmets 为单位。找到相应的注释，然后提取表格

import requests
from bs4 import BeautifulSoup as bs
from bs4 import Comment
import pandas as pd

r= requests.get('https://www.pro-football-reference.com/boxscores/201709070nwe.htm#')
soup = bs(r.content, "lxml")
comments = soup.find_all(string=lambda text:isinstance(text,Comment))

for comment in comments:
    if 'id="player_offense"' in comment:
        print(pd.read_html(comment)[0])
        break

【讨论】：

谢谢！我知道它在 cmets 中，但我认为这与 JavaScript 有关。很高兴知道我不需要硒
您也不需要 pandas，但它非常适合快速格式化表格提取。

【解决方案2】：

这也有效。

from requests_html import HTMLSession, HTML
import pandas as pd

with HTMLSession() as s:

    r = s.get('https://www.pro-football-reference.com/boxscores/201709070nwe.htm')

    r = HTML(html=r.text)
    r.render()
    table = r.find('table#player_offense', first=True)
    df = pd.read_html(table.html)
    print(df)

【讨论】：