用漂亮的汤刮桌子答案

【问题标题】：Scraping a table with beautiful soup用漂亮的汤刮桌子
【发布时间】：2017-12-23 23:36:53
【问题描述】：

我正在尝试从该网站获取价格表（购买，价格和合同可用）：https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices。

这是我的（显然是非常初步的）代码，现在结构化只是为了找到表格：

from bs4 import BeautifulSoup
import requests
from lxml import html
import json, re

url = "https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices"

ret = requests.get(url).text

soup = BeautifulSoup(ret, "lxml")

try:
    table = soup.find('table')
    print table
except AttributeError as e:
    print 'No tables found, exiting'

代码找到并解析一个表；但是，它是错误的（不同选项卡上的数据表https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#data）。

如何解决此错误以确保代码识别正确的表？

【问题讨论】：

你要哪张桌子？最好的办法是使用soup.find_all('table')，然后遍历它返回的列表。遍历它时，仅搜索您想要的表具有的特定元素
@TerryA 运行该代码，它没有识别所需的表，只是第一个选项卡上的表。
你想从你提供的第一个链接中得到什么表？
@TerryA 这个人i.stack.imgur.com/21y42.png
奇怪，当我尝试 requests.get(url) 时，我似乎收到了错误 requests.exceptions.ConnectionError: ('Connection aborted.', error(54, 'Connection reset by peer'))

标签： python web-scraping beautifulsoup

【解决方案1】：

正如 cmets 中提到的@downshift，该表是使用 xhr 请求生成的 js。
因此，您可以使用Selenium 或直接向站点的api 发出请求。

使用第二个选项：

url = "https://www.predictit.org/PrivateData/GetPriceListAjax?contractId=7069"
ret = requests.get(url).text
soup = BeautifulSoup(ret, "lxml")
table = soup.find('table')

【讨论】：

感谢您的帮助！