遍历 beautifulsoup4 对象的正确方法答案

【问题标题】：Proper way of traversing a beautifulsoup4 object遍历 beautifulsoup4 对象的正确方法
【发布时间】：2015-06-16 22:23:10
【问题描述】：

我有一些像这样草率的 HTML...

<span>STATS</span>
<table> ... </table>
<span>Page 1 of 5</span>

还有一些 Beautiful Soup 代码试图做到这一点..

table = soup.find('span', text='STATS').nextSibling('table')[0]
pagespan = table.nextSibling('span')

抛出异常TypeError: 'NavigableString' object is not callable

这样做的最佳方法是什么？没有元素具有 DOM ID 或 CSS 类或任何有用或唯一标识。 table 中有一大堆经过测试的 table 元素，但我不想要它们。只是在同一个 DOM 级别的东西。

谢谢。

【问题讨论】：

只需删除第一行代码中的[0]。
这不起作用。例外：AttributeError: 'ResultSet' object has no attribute 'nextSibling' --- 我想我需要访问第一个元素，因为它是找到的第一个表。我错了吗？

标签： python python-3.x beautifulsoup

【解决方案1】：

以下代码对我来说工作得很好 -

from bs4 import BeautifulSoup

html = "<span>STATS</span><table> ... </table><span>Page 1 of 5</span>"
soup = BeautifulSoup(html)

table = soup.find('span', text='STATS').find_next_sibling('table')
pagespan = table.find_next_sibling('span')
print pagespan.text

【讨论】：