有没有一种干净的方法可以使用 BeautifulSoup 获取 html 表的第 n 列？

【问题标题】：Is there a clean way to get the n-th column of an html table using BeautifulSoup?有没有一种干净的方法可以使用 BeautifulSoup 获取 html 表的第 n 列？
【发布时间】：2011-07-28 18:52:54
【问题描述】：

假设我们查看页面中的第一个表，那么：

table = BeautifulSoup(...).table

可以使用干净的 for 循环扫描行：

for row in table:
    f(row)

但是为了获得单列，事情变得一团糟。

我的问题：有没有一种优雅的方法来提取单个列，无论是通过它的位置，还是通过它的“名称”（即出现在该列第一行的文本）？

【问题讨论】：

标签： python beautifulsoup html-table

【解决方案1】：

lxml 比 BeautifulSoup 快很多倍，所以你可能想要使用它。

from lxml.html import parse
doc = parse('http://python.org').getroot()
for row in doc.cssselect('table > tr'):
    for cell in row.cssselect('td:nth-child(3)'):
         print cell.text_content()

或者，而不是循环：

rows = [ row for row in doc.cssselect('table > tr') ]
cells = [ cell.text_content() for cell in rows.cssselect('td:nth-child(3)') ]
print cells

【讨论】：