刮表的困难（Python，BeautifulSoup）答案

【问题标题】：Difficulty with Scraping Table (Python,BeautifulSoup)刮表的困难（Python，BeautifulSoup）
【发布时间】：2018-07-14 20:15:22
【问题描述】：

我正在努力从这个网站上抓取表格：

http://www.espn.com/mlb/lines

具体来说，我正在尝试为表中列出的每个游戏的“Westgate”行刮取“Run Line”列。

我不确定自己做错了什么，因为我只是想深入了解表格中的文本，根据我对网络抓取的有限理解，这将是我选择的“奇怪”表格中的第二个表格。

我已尝试搜索我的问题，但在将任何建议的解决方案应用于我的特定场景时遇到了麻烦。

提前感谢您的帮助。

这是我目前的代码

url='http://www.espn.com/mlb/lines'
driver = webdriver.Chrome() 
driver.get(url)
time.sleep(5)
content=driver.page_source

soup=BeautifulSoup(content,'lxml')

driver.quit()

table=soup.find('table',{'class':'tablehead'})
table_row=table.find_all('tr',{'class':'oddrow'})
table_data=table_row.find_all('table',{'class':'tablehead'})[1] #trying to 
#just scrape the second table only within this row, ie the Westgate and Runline table

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-397-fea09cb40cb2> in <module>()
----> 1 table_data=table_row.find_all('table',{'class':'tablehead'})

~\Anaconda3\lib\site-packages\bs4\element.py in __getattr__(self, key)
   1805     def __getattr__(self, key):
   1806         raise AttributeError(
-> 1807             "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
   1808         )

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

【问题讨论】：

标签： python python-3.x beautifulsoup

【解决方案1】：

我相信下面给出了你想要的输出，可能有更好的方法来做到这一点，但我使用嵌套循环来增加 i 直到它是 3，因为你每次都想要汤中的第三张桌子，然后我增加奇数行索引这从循环内的西门行返回运行线列：

from bs4 import BeautifulSoup
from selenium import webdriver

url='http://www.espn.com/mlb/lines'
driver = webdriver.webdriver.Chrome() 
driver.get(url)
content=driver.page_source

oddrowindex = 0
soup=BeautifulSoup(content,'lxml')

while oddrowindex < 70:
        i = 0
        table_row=soup.find_all('tr',{'class':'oddrow'})[oddrowindex]
        for td in table_row:
                if (i == 3):
                        print(td.text)
                i = i + 1
                oddrowindex = oddrowindex + 1

driver.quit()

样本输出：

【讨论】：

您好，我正在深入研究您的解决方案，但无法理解它。你有两个问题...... 1）。你怎么能跳过我们不想要的那些奇怪的东西，比如 WilliamHill 和 CG Technology？ 2）。我只在每个奇数行下看到两个表格类，在这种情况下，表格是'text-align：center'吗？此外，i 变量如何知道您准确地引用了这些表？
对于 tr 类奇数行中的每个 td，i 变量都会递增，它读取的每个 td 都会递增，然后在第 3 行（运行行）它将打印这些内容