【问题标题】:Using Beautifulsoup to get a tags and attriibutes of these a tags使用 Beautifulsoup 获取这些标签的标签和属性
【发布时间】:2021-03-10 19:13:34
【问题描述】:

我刚开始使用 beautifulsoup,但遇到了一个关于在其他标签中获取标签属性的问题。我正在使用 whitehouse.gov/briefing-room/ 进行练习。我现在要做的只是获取此页面上的所有链接并将它们附加到一个空列表中。这是我现在的代码:

    result = requests.get("https://www.whitehouse.gov/briefing-room/")

    src = result.content
    soup = BeautifulSoup(src, 'lxml')

    urls = []

    for h2_tags in soup.find_all('h2'):
        a_tag = h2_tags.find('a')
        urls.append(a_tag.attr['href']) # This is where I get the NoneType error

此代码返回

【问题讨论】:

    标签: python beautifulsoup python-requests


    【解决方案1】:

    问题是,一些<h2> 标签不包含<a> 标签。所以你必须检查那个替代方案。或者使用 CSS 选择器选择 <h2> 下的所有 <a> 标签:

    import requests
    from bs4 import BeautifulSoup
    
    
    result = requests.get("https://www.whitehouse.gov/briefing-room/")
    
    src = result.content
    soup = BeautifulSoup(src, 'lxml')
    
    urls = []
    
    for a_tag in soup.select('h2 a'):    # <-- select <A> tags that are under <H2> tags
        urls.append(a_tag.attrs['href'])
    
    print(*urls, sep='\n')
    

    打印:

    https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/10/statement-by-nsc-spokesperson-emily-horne-on-national-security-advisor-jake-sullivan-leading-the-first-virtual-meeting-of-the-u-s-israel-strategic-consultative-group/
    https://www.whitehouse.gov/briefing-room/press-briefings/2021/03/09/press-briefing-by-press-secretary-jen-psaki-and-deputy-director-of-the-national-economic-council-bharat-ramamurti-march-9-2021/
    https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/readout-of-the-white-houses-meeting-with-climate-finance-leaders/
    https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/readout-of-vice-president-kamala-harris-call-with-prime-minister-erna-solberg-of-norway/
    https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/nomination-sent-to-the-senate-3/
    https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/president-biden-announces-key-hire-for-the-office-of-management-and-budget/
    https://www.whitehouse.gov/briefing-room/speeches-remarks/2021/03/09/remarks-by-president-biden-during-tour-of-w-s-jenks-son/
    https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/president-joseph-r-biden-jr-approves-louisiana-disaster-declaration/
    https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/statement-by-president-joe-biden-on-the-house-taking-up-the-pro-act/
    https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/white-house-announces-additional-staff/
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-10-09
      • 1970-01-01
      • 2021-07-22
      • 2017-11-27
      • 2020-05-02
      • 1970-01-01
      相关资源
      最近更新 更多