【问题标题】:Unable to get the anchor tag using beautifulsoup无法使用 beautifulsoup 获取锚标签
【发布时间】:2020-09-04 23:18:52
【问题描述】:

我想从部分内的锚标记列表中获取名称和链接,但我无法获取。

网址https://www.snopes.com/collections/new-coronavirus-collection/

category=[]
url=[]
for ul in soup.findAll('a',{"class":"collected-list"}):
    if ul is not None:
        category.append(ul.get_text())
    else:
        category.append("")
    links = ul.findAll('a')
    if links is not None:
        for a in links:
            url.append(a['href'])

早些时候,我能够得到列表和 URL,但现在网站结构发生了变化,我的代码不起作用,预期的输出是这样的

【问题讨论】:

    标签: html python-3.x beautifulsoup


    【解决方案1】:

    看起来感兴趣的a 标记现在是collected-item 而不是collected-list(现在是section 类)。您可以搜索类名称为collected-item 的所有a 标签,然后在同一个锚点下找到类titleh5 标签,以获取标题描述,它似乎包含(通过一些操作)您描述的类别在您的预期输出中。

    from bs4 import BeautifulSoup
    import requests
    
    source = requests.get('https://www.snopes.com/collections/new-coronavirus-collection/').text
    soup = BeautifulSoup(source, 'lxml')
    
    category=[]
    url = []
    
    for ul in soup.findAll('a',{"class":"collected-item"}):
        if ul is not None:
            title = ul.find('h5', {"class": "title"}).get_text()
            title_short = title.replace("The Coronavirus Collection: ","")
            category.append(title_short)
            url.append(ul['href'])
    
    for c,u in zip(category, url):
        print(c,u)
    
    Origins and Spread https://www.snopes.com/collections/coronavirus-origins-treatments/?collection-id=238235
    Prevention and Treatments https://www.snopes.com/collections/coronavirus-collection-prevention-treatments/?collection-id=238235
    Prevention and Treatments II https://www.snopes.com/collections/coronavirus-collection-prevention-treatments-2/?collection-id=238235
    International Response https://www.snopes.com/collections/coronavirus-international-rumors/?collection-id=238235
    US Government Response https://www.snopes.com/collections/coronavirus-government-role/?collection-id=238235
    Trump and the Pandemic https://www.snopes.com/collections/coronavirus-collection-trump/?collection-id=238235
    Trump and the Pandemic II https://www.snopes.com/collections/coronavirus-collection-trump-2/?collection-id=238235
    

    【讨论】: