Beautifulsoup 抓取中缺少 html答案

【问题标题】：Missing html in beautifulsoup scrapeBeautifulsoup 抓取中缺少 html
【发布时间】：2019-12-10 10:27:42
【问题描述】：

我正在尝试从以下位置抓取数据表：https://www.pjm.com/planning/services-requests/interconnection-queues.aspx

我希望自动执行此数据提取，而不必每天手动下载 xls 文件。我查看了他们的文档，除了执行抓取之外，没有简单的方法可以实现自动化。查看页面源，看起来这些数据存储在“nggrid”表中。具体来说，数据在.

我使用 beautifulsoup 在 python 中创建了一个基线查询，以查看初始输出是什么。

这是我的初始代码：

from bs4 import BeautifulSoup
import requests

page_link = 'https://www.pjm.com/planning/services-requests/interconnection-queues.aspx'

page_response = requests.get(page_link, timeout=5)

page_content = BeautifulSoup(page_response.content, "html.parser")

在 page_content 中存储的数据拉取中，没有向我提供与页面源相同的信息。在我期望的地方及其各种子信息中，我提供了以下打开和关闭的标签，中间没有数据：

<pjm-nggrid></pjm-nggrid>

有人知道如何访问 nggrid 中的数据吗？

【问题讨论】：

标签： javascript python html web-scraping beautifulsoup

【解决方案1】：

数据通过 Javascript 异步加载。可能你会想改'api-subscription-key'，你可以在Chrome/Firefox开发者工具中看到key：

注意（这将下载整个数据 ~10MB，您可以更改 'rowCount' 和 'startRow' 以仅加载部分数据）

import json
import requests

url = 'https://services.pjm.com/PJMPlanningApi/api//Queue/GetFilteredQueues?'

payload = {'filters': [],
    'rowCount':0,
    'startRow':1
}

headers = {
    'Origin': 'https://www.pjm.com',
    'api-subscription-key': 'E29477D0-70E0-4825-89B0-43F460BF9AB4'
}

json_data = requests.post(url, headers=headers, json=payload).json()

print(json.dumps(json_data, indent=4))

打印：

{
    "items": [
        {
            "requestType": "GI",
            "queueNumber": "A01",
            "projectName": null,
            "commercialName": "Ironwood",
            "stateProvinceName": "PA",
            "countyName": "Lebanon",
            "projectStatus": "In Service",
            "transmissionOwner": "ME",
            "mw": 720.0,
            "mwe": 720.0,
            "mwc": 673.0,
            "mweInservice": 673.0,

...and so on.

【讨论】：

这很完美！我是网络抓取和使用这些工具的新手，所以这非常有用。展望未来，您发现获取正确 url 以抓取此类数据的最佳方法是什么？我按照link 的步骤操作，它似乎与您提供的信息相符。
@DaveWilson 好吧，每个页面在这方面都有点不同。但是打开 Firefox/Chrome 开发者工具并查看页面在何处（以及如何）发出请求是一个好的开始。

【解决方案2】：

自动下载和解析 xls/xml 文件而不是 scraping 怎么样？

【讨论】：