使用 BeautifulSoup 内部标签进行解析答案

【问题标题】：Parsing using BeautifulSoup inner tags使用 BeautifulSoup 内部标签进行解析
【发布时间】：2020-11-29 13:54:47
【问题描述】：

仍在学习如何使用 BeautifulSoup，我正在尝试使用 python3 和 BeautifulSoup 从 NFL 网站中获取一些信息。我将网站解析为 lxml：

soup = BeautifulSoup(source, 'lxml')

然后我找到所有的比赛信息：

matchups = soup.findAll("div", {"class": "cmg_game_data cmg_matchup_game_box"})

此时，对战列表中的每场对决都包含大量数据，如下所示：

<div class="cmg_game_data cmg_matchup_game_box" data-away-conference="American Football Conference" data-away-team-city-search="Houston" data-away-team-fullname-search="Houston" data-away-team-nickname-search="Texans" data-away-team-shortname-search="HOU" data-competition-type="Week 1" data-conference="American Football Conference" data-event-id="80767" data-following="false" data-game-date="2020-09-10 20:20:00" data-game-odd="-10" data-game-total="54.5" data-handicap-difference="0" data-home-conference="American Football Conference" data-home-team-city-search="Kansas City" data-home-team-fullname-search="Kansas City" data-home-team-nickname-search="Chiefs" data-home-team-shortname-search="KC" data-index="0" data-last-update="2020-05-07T22:50:26.5700000" data-link="/sport/football/nfl/matchup/201993" data-sdi-event-id="/sport/football/competition:80767" data-top-twenty-five="false">

我想专门获取这些内部类（标签？属性？），例如 data-away-conference 和 data-game-odd。如何解析下一个级别以提取这些项目？我试过了：

for matchup in matchups:
    awayconference = matchup.find("data-away-conference")

但是这会返回 None。在

【问题讨论】：

标签： python beautifulsoup

【解决方案1】：

使用[] 访问标签的属性：

from bs4 import BeautifulSoup


txt = '''
    <div class="cmg_game_data cmg_matchup_game_box" data-away-conference="American Football Conference" data-away-team-city-search="Houston" data-away-team-fullname-search="Houston" data-away-team-nickname-search="Texans" data-away-team-shortname-search="HOU" data-competition-type="Week 1" data-conference="American Football Conference" data-event-id="80767" data-following="false" data-game-date="2020-09-10 20:20:00" data-game-odd="-10" data-game-total="54.5" data-handicap-difference="0" data-home-conference="American Football Conference" data-home-team-city-search="Kansas City" data-home-team-fullname-search="Kansas City" data-home-team-nickname-search="Chiefs" data-home-team-shortname-search="KC" data-index="0" data-last-update="2020-05-07T22:50:26.5700000" data-link="/sport/football/nfl/matchup/201993" data-sdi-event-id="/sport/football/competition:80767" data-top-twenty-five="false"></div>
'''

soup = BeautifulSoup(txt, 'html.parser')

matchups = soup.findAll("div", {"class": "cmg_game_data cmg_matchup_game_box"})

for matchup in matchups:
    awayconference = matchup["data-away-conference"]  # or you can use matchup.get("data-away-conference")
    print(awayconference)

打印：

American Football Conference

【讨论】：