Python 和 Beautifulsoup - 搜索结果字符串答案

【问题标题】：Python & Beautiful Soup - Searching result stringsPython 和 Beautifulsoup - 搜索结果字符串
【发布时间】：2013-04-03 19:32:02
【问题描述】：

我正在使用 Beautiful Soup 来解析 HTML 表格。

Python 3.2 版
靓汤4.1.3版

我在尝试使用 findAll 方法查找行中的列时遇到了问题。我收到一个错误，说列表对象没有属性 findAll。我通过堆栈交换的另一篇文章找到了这种方法，这不是问题。 (BeautifulSoup HTML table parsing)

我意识到 findAll 是 BeautifulSoup 的一种方法，而不是 python 列表。奇怪的是 findAll 方法在我在表列表中找到行时起作用（我只需要页面上的第二个表），但是当我尝试在行列表中查找列时。

这是我的代码：

from urllib.request import URLopener
from bs4 import BeautifulSoup

opener = URLopener() #Open the URL Connection
page = opener.open("http://www.labormarketinfo.edd.ca.gov/majorer/countymajorer.asp?CountyCode=000001") #Open the page
soup = BeautifulSoup(page)

table = soup.findAll('table')[1] #Get the 2nd table (index 1)
rows = table.findAll('tr') #findAll works here
cols = rows.findAll('td') #findAll fails here
print(cols)

【问题讨论】：

标签： python-3.x beautifulsoup

【解决方案1】：

findAll() 返回一个 result 列表，您需要遍历这些或选择 one 以使用它自己的 findAll() 方法获取另一个包含的元素：

table = soup.findAll('table')[1]
rows = table.findAll('tr')
for row in rows:
    cols = rows.findAll('td')
    print(cols)

或选择一个行：

table = soup.findAll('table')[1]
rows = table.findAll('tr')
cols = rows[0].findAll('td')  # columns of the *first* row.
print(cols)

请注意，findAll 已弃用，您应该改用 find_all()。

【讨论】：