无法使用 Python 请求复制 AJAX答案

【问题标题】：Not able to replicate AJAX using Python Requests无法使用 Python 请求复制 AJAX
【发布时间】：2020-12-30 18:23:27
【问题描述】：

我正在尝试从网页 (https://droughtmonitor.unl.edu/Data/DataTables.aspx) 复制 ajax 请求。当我们从下拉列表中选择值时会启动 AJAX。

我正在使用 python 使用以下请求，但无法在浏览器的“网络”选项卡中看到响应。

import bs4
import requests
import lxml

ses = requests.Session()
ses.get('https://droughtmonitor.unl.edu/Data/DataTables.aspx')
headers_dict = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
    }
url = 'https://droughtmonitor.unl.edu/Ajax2018.aspx/ReturnTabularDMAreaPercent_urban'
req_data = {'area':'00064', 'statstype':'1'}
resp = ses.post(url,data = req_data,headers = headers_dict)
    
soup = bs4.BeautifulSoup(resp.content,'lxml')
print(soup)

【问题讨论】：

你的变量state是什么？我无法运行您的代码，因为未定义 state。
@wuerfelfreak 更新

标签： python python-3.x ajax web-scraping python-requests

【解决方案1】：

您需要在请求中添加几项内容才能从服务器获得答复。

您需要将dict 转换为json 以将其作为字符串而不是dict 传递。
还需要通过将请求头设置为Content-Type:application/json; charset=utf-8来指定request-data的类型

通过这些更改，我能够请求正确的数据。

import bs4
import requests

ses = requests.Session()
ses.get('https://droughtmonitor.unl.edu/Data/DataTables.aspx')
headers_dict = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
                'Content-Type': 'application/json; charset=utf-8'}
url = 'https://droughtmonitor.unl.edu/Ajax2018.aspx/ReturnTabularDMAreaPercent_urban'
req_data = json.dumps({'area':'00037', 'statstype':'1'})
resp = ses.post(url,data = req_data,headers = headers_dict)
    
soup = bs4.BeautifulSoup(resp.content,'lxml')
print(soup)

我必须说一个相当棘手的问题。

【讨论】：

没有必要尝试通过引号将python dict转换为json。请求会为您处理好，如果没有，您应该使用 json.dumps(pythondict)。
哦，是的。我忘记了req_data 不是恒定的。我添加了json.dumps，但其余部分保持不变，以免侵犯您的答案。 @EricTruett

【解决方案2】：

来自请求文档：

Instead of encoding the dict yourself, you can also pass it directly using the json parameter (added in version 2.4.2) and it will be encoded automatically:

>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

>>> r = requests.post(url, json=payload)

然后，要获取输出，请致电r.json()，您将获得您要查找的数据。

【讨论】：