Python 3 从体育网站提取 html 数据答案

【问题标题】：Python 3 extract html data from sports sitePython 3 从体育网站提取 html 数据
【发布时间】：2021-01-25 14:08:54
【问题描述】：

我一直在尝试从体育网站提取数据，但目前失败了。我正在尝试提取 35、射正和 23，但一直失败。

<div class="statTextGroup">
   <div class="statText statText--homeValue">35</div>
   <div class="statText statText--titleValue">Shots on Goal</div>
   <div class="statText statText--awayValue">23</div></div>

from bs4 import BeautifulSoup
import requests

result = requests.get("https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0")
src = result.content

soup = BeautifulSoup(src, 'html.parser')

stats = soup.find("div", {"class": "tab-statistics-0-statistic"})
print(stats)

这是我一直在尝试使用的代码，当我运行它时，我得到“无”打印给我。谁能帮我把数据打印出来。

在此处找到完整页面：https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0

【问题讨论】：

d.scoreboard.com/uk/x/feed/d_st_lvbns58C_en-uk_1 将返回您正在寻找的信息。

标签： python html web-scraping beautifulsoup

【解决方案1】：

由于网站是由 javascript 渲染的，可能的选项是使用 selenium 加载页面，然后使用 BeautifulSoup 解析它：

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

# initialize selenium driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('<<PATH_TO_SELENIUMDRIVER>>', options=chrome_options)

# load page via selenium
wd.get("https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0")

# wait 30 seconds until element with class mainGrid will be loaded
table = WebDriverWait(wd, 30).until(EC.presence_of_element_located((By.ID, 'statistics-content')))

# parse content of the table
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'html.parser')

print(soup)

# close selenium driver
wd.quit()

【讨论】：