【问题标题】:Parsing using BeautifulSoup inner tags使用 BeautifulSoup 内部标签进行解析
【发布时间】:2020-11-29 13:54:47
【问题描述】:

仍在学习如何使用 BeautifulSoup,我正在尝试使用 python3 和 BeautifulSoup 从 NFL 网站中获取一些信息。我将网站解析为 lxml:

soup = BeautifulSoup(source, 'lxml')

然后我找到所有的比赛信息:

matchups = soup.findAll("div", {"class": "cmg_game_data cmg_matchup_game_box"})

此时,对战列表中的每场对决都包含大量数据,如下所示:

<div class="cmg_game_data cmg_matchup_game_box" data-away-conference="American Football Conference" data-away-team-city-search="Houston" data-away-team-fullname-search="Houston" data-away-team-nickname-search="Texans" data-away-team-shortname-search="HOU" data-competition-type="Week 1" data-conference="American Football Conference" data-event-id="80767" data-following="false" data-game-date="2020-09-10 20:20:00" data-game-odd="-10" data-game-total="54.5" data-handicap-difference="0" data-home-conference="American Football Conference" data-home-team-city-search="Kansas City" data-home-team-fullname-search="Kansas City" data-home-team-nickname-search="Chiefs" data-home-team-shortname-search="KC" data-index="0" data-last-update="2020-05-07T22:50:26.5700000" data-link="/sport/football/nfl/matchup/201993" data-sdi-event-id="/sport/football/competition:80767" data-top-twenty-five="false">

我想专门获取这些内部类(标签?属性?),例如 data-away-conference 和 data-game-odd。如何解析下一个级别以提取这些项目?我试过了:

for matchup in matchups:
    awayconference = matchup.find("data-away-conference")

但是这会返回 None。在

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    使用[] 访问标签的属性:

    from bs4 import BeautifulSoup
    
    
    txt = '''
        <div class="cmg_game_data cmg_matchup_game_box" data-away-conference="American Football Conference" data-away-team-city-search="Houston" data-away-team-fullname-search="Houston" data-away-team-nickname-search="Texans" data-away-team-shortname-search="HOU" data-competition-type="Week 1" data-conference="American Football Conference" data-event-id="80767" data-following="false" data-game-date="2020-09-10 20:20:00" data-game-odd="-10" data-game-total="54.5" data-handicap-difference="0" data-home-conference="American Football Conference" data-home-team-city-search="Kansas City" data-home-team-fullname-search="Kansas City" data-home-team-nickname-search="Chiefs" data-home-team-shortname-search="KC" data-index="0" data-last-update="2020-05-07T22:50:26.5700000" data-link="/sport/football/nfl/matchup/201993" data-sdi-event-id="/sport/football/competition:80767" data-top-twenty-five="false"></div>
    '''
    
    soup = BeautifulSoup(txt, 'html.parser')
    
    matchups = soup.findAll("div", {"class": "cmg_game_data cmg_matchup_game_box"})
    
    for matchup in matchups:
        awayconference = matchup["data-away-conference"]  # or you can use matchup.get("data-away-conference")
        print(awayconference)
    

    打印:

    American Football Conference
    

    【讨论】:

      猜你喜欢
      • 2013-03-20
      • 1970-01-01
      • 1970-01-01
      • 2020-09-19
      • 1970-01-01
      • 2017-09-24
      • 1970-01-01
      • 2011-04-29
      • 1970-01-01
      相关资源
      最近更新 更多