如何基于文本查找元素忽略beautifulsoup中的子标签答案

【问题标题】：How to find element based on text ignore child tags in beautifulsoup如何基于文本查找元素忽略beautifulsoup中的子标签
【发布时间】：2018-10-31 17:19:03
【问题描述】：

我正在寻找一种使用 Python 和 BeautifulSoup 来根据内部文本查找元素的解决方案。例如：

<div> <b>Ignore this text</b>Find based on this text </div>

我怎样才能找到这个 div？谢谢你的帮助！

【问题讨论】：

标签： python python-3.x beautifulsoup

【解决方案1】：

您可以将.find 与text 参数一起使用，然后将findParent 用于父元素。

例如：

from bs4 import BeautifulSoup
s="""<div> <b>Ignore this text</b>Find based on this text </div>"""
soup = BeautifulSoup(s, 'html.parser')
t = soup.find(text="Find based on this text ") 
print(t.findParent())

输出：

<div> <b>Ignore this text</b>Find based on this text </div>

【讨论】：

如果文本出现多个，能否找到所有包含该文本的div？
当然。你可以使用soup.find_all(text="")
我可以指定 find_all 只查找 div 标签吗？
请注意，findParent() 是 bs3 语法。对于 bs4，它是 find_parent()，或者简单地使用属性 .parent。 find_parent() 通常在需要查找特定父（祖先）标签时使用。例如，find_parent('a').

【解决方案2】：

试试吧，它就像一个例子，但它有效

from bs4 import BeautifulSoup
html="""
<div> <b>Ignore this text</b>Find based on this text </div>
"""

soup = BeautifulSoup(html, 'lxml')                                                                                                                                                

s = soup.find('div')

for child in s.find_all('b'):
    child.decompose()

print(s.get_text())

输出

 Find based on this text

【讨论】：

我认为问题是要求像这里stackoverflow.com/a/8936235/986546 那样进行文本搜索