使用 Beautifulsoup find_all 时如何跳过标签？答案

【问题标题】：How to skip a tag when using Beautifulsoup find_all?使用 Beautifulsoup find_all 时如何跳过标签？
【发布时间】：2022-11-23 04:14:25
【问题描述】：

我想编辑 HTML 文档并使用 Beautifulsoup 解析一些文本。我对 <span> 标签感兴趣，但对那些不在 <table> 元素内的标签感兴趣。我想在查找 <span> 元素时跳过所有表格。

我试图首先找到所有 <span> 元素，然后过滤掉在任何父级中具有 <table> 的元素。这是代码。但这太慢了。

for tag in soup.find_all('span'):
    ancestor_tables = [x for x in tag.find_all_previous(name='table')]
    if len(ancestor_tables) > 0:
        continue

    text = tag.text

有没有更有效的选择？在 find_all 方法中搜索 <span> 时是否可以“隐藏”/跳过标签？

【问题讨论】：

标签： python html beautifulsoup

【解决方案1】：

你可以使用.find_parent()：

for tag in soup.find_all("span"):
    if tag.find_parent("table"):
        continue
    # we are not inside <table>
    # ...

【讨论】：