BeautifulSoup find 和 find_all 没有按预期工作答案

【问题标题】：BeautifulSoup find and find_all not working as expectBeautifulSoup find 和 find_all 没有按预期工作
【发布时间】：2014-08-30 19:18:25
【问题描述】：

我刚开始使用 BeautifulSoup，但遇到了问题。我在下面设置了一个 html sn-p 并制作了一个 BeautifulSoup 对象：

html_snippet = '<p class="course"><span class="text84">Ae 100. Research in Aerospace. </span><span class="text85">Units to be arranged in accordance with work accomplished. </span><span class="text83">Open to suitably qualified undergraduates and first-year graduate students under the direction of the staff. Credit is based on the satisfactory completion of a substantive research report, which must be approved by the Ae 100 adviser and by the option representative. </span> </p>'
subject = BeautifulSoup(html_snippet)

我已经尝试了几个 find 和 find_all 操作，如下所示，但我得到的只是一个空列表：

subject.find(text = 'A') 
subject.find(text = 'Research')
subject.next_element.find('A')
subject.find_all(text = 'A')

当我之前从计算机上的 html 文件创建 BeautifulSoup 对象时，find 和 find_all 操作都运行良好。但是，当我通过 urllib2 从在线阅读网页中提取 html_sn-p 时，我遇到了问题。

谁能指出问题出在哪里？

【问题讨论】：

您没有任何文本完全等于“A”或“Research”的节点 - 您的节点的第一个单词为 A（或以 A 开头的单词，例如：ae ) 和另一个研究...

标签： python beautifulsoup

【解决方案1】：

像这样传递参数：

import re
subject.find(text=re.compile('A'))

text 过滤器的默认行为是匹配整个正文。传入正则表达式可以匹配片段。

编辑：要仅匹配以 A 开头的主体，您可以使用以下内容：

subject.find(text=re.compile('^A'))

要仅匹配包含以 A 开头的单词的正文，您可以使用：

subject.find_all(text = re.compile(r'\bA'))

很难更具体地说明您在寻找什么，如果我误解了您的要求，请告诉我。

【讨论】：

可能想要在那里的单词边界（至少我猜那是 OP 真正想要的）
@JonClements 我只在他的示例文本中看到Ae。也许他的意思是在身体的开头进行匹配。我会编辑。
可能是最好的——“A”这个词和出现在一行文本中的大写字母“A”有很大的不同:)无论如何——这一定是正确的答案，所以+1来自我
我在文本中寻找匹配的单词，这对我有用。