Python：如何使用 BeautifulSoup 查找第一个锚标记的文本答案

【问题标题】：Python: How to find text of first anchor tag using BeautifulSoupPython：如何使用 BeautifulSoup 查找第一个锚标记的文本
【发布时间】：2016-08-22 02:52:40
【问题描述】：

我有一个这样的 HTML 结构

<p class="title">
  <a href="abc.com">
   Story
  </a> 
  <span class="domain">
    <a href="xyz.com">comments</a>
  </span>
</p>

我想提取第一个锚标签的文本，即Story

这是我如何使用Beautifulsoup 从锚标签中提取文本

soup = BeautifulSoup(html, 'html.parser')
soup.prettify()
for link in soup.find_all(class_='title'):
      print link.findNext('a').text

和输出：

Story

Comments

但我只想提取第一个锚标记的文本，即Story。如何在 python 中使用 BeautifulSoup 做到这一点？

【问题讨论】：

标签： python beautifulsoup

【解决方案1】：

你可以通过这样做访问第一个a标签

print link.a.text

去除多余的空格

link.a.text.strip()

【讨论】：

【解决方案2】：

您可以通过链接 find() 调用并使用 get_text() 方法来做到这一点：

soup.find("p", class_="title").a.get_text(strip=True)

其中.a 等同于BeautifulSoup 中的.find("a")。

或者，使用CSS selector：

soup.select_one("p.title > a").get_text(strip=True)

【讨论】：

我收到了这个错误 AttributeError: 'NoneType' object has no attribute 'get_text'
@ShoaibAkhtar 那么，HTML 与您呈现的不同。

【解决方案3】：

如果你只想要第一个锚的文本，那么你不需要find 使用该类。

你没有提到class="title"。

In [9]: html = """
<p class="title">
  <a href="abc.com">
   Story
  </a>
  <span class="domain">
    <a href="xyz.com">comments</a>
  </span>
</p>
"""
In [10]: soup = BeautifulSoup(html, "html.parser")
In [11]: soup.a.text.strip()
Out[11]: u'Story'

【讨论】：

假设上面的 html 结构重复多次，那么我如何在所有标签中找到第一个锚标签，其类是'title'
我的答案总是会找到第一个锚标记，而不管任何类。如果您想要某个元素中的第一个锚点，请查看其他答案