所以.. 在request.post() 方法的情况下,data 参数是字典,它表示 html 帖子表单的键值对。为了找到它,您可以在浏览器中使用 DevTools(Chrome 和 Mozilla 中的 shift-ctr-I),打开 Network 选项卡并提交您需要检查的表单 - 在您的情况下,表单表示为单个 @987654323 @ 元素(风格化为“下载 PDF”按钮)。在您点击此输入后,浏览器将向服务器发出格式正确的 POST 请求,您可以在“网络”选项卡上看到该请求的正确 html 标头和键值 - 只需 grub 并在您的 python 脚本中形成两个字典:第一个带有标题,第二个带有表单后的值。
前面发布的网址示例
# http headers
headers =
{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,de;q=0.6',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Length': '1017',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': 'ASP.NET_SessionId=okonm4wfhg5ddup5e0wkp0ur; BIGipServerfdic_Forward_prod_80=172495532.20480.0000; _ga=GA1.2.77529009.1621351450; _gid=GA1.2.1620156842.1621351450',
'Host': 'cdr.ffiec.gov',
'Origin': 'https://cdr.ffiec.gov',
'Referer': 'https://cdr.ffiec.gov/Public/ViewFacsimileDirect.aspx?ds=call&idType=fdiccert&id=8426&date=03312021',
'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36' }
# Form data
post_form_data =
{ "__EVENTTARGET": "",
"__EVENTARGUMENT": "",
"__VIEWSTATE": "/wEPDwULLTE0NTY3MjMzNTQPFggeHVZpZXdQREZGYWNzaW1pbGVfU3VibWlzc2lvbklEApTmYR4UVmlld1BERkZhY3NpbWlsZU1vZGULKX1DZHIuUGRkLlVJLkNvbnRyb2xzLlVJSGVscGVyK1ZpZXdGYWNzaW1pbGVNb2RlLCBDZHIuUGRkLlVJLlByb2Nlc3NlcywgVmVyc2lvbj03LjEuMTMzLjAsIEN1bHR1cmU9bmV1dHJhbCwgUHVibGljS2V5VG9rZW49bnVsbAAeBkZJTmFtZQV4MVNUIFNVTU1JVCBCQU5LICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgHg5GRElDQ2VydE51bWJlcgUEODQyNhYCZg9kFgICAQ9kFgICBw9kFgYCAQ9kFgQCAQ8PFgYeBFRleHRkHghDc3NDbGFzcwUJZG9jaGVhZGVyHgRfIVNCAgJkZAIDDw8WBh8EZB8FBQZoZWFkZXIfBgICZGQCAw9kFgICAQ8UKwACZBQrAAUUKwAIaAUFUHJpbnRoaGRoZ2QUKwAIZwUNRG93bmxvYWQgWEJSTGdoZGhnZBQrAAhnBQxEb3dubG9hZCBQREZnaGRoZ2QUKwAIZwUMRG93bmxvYWQgU0RGZ2hkaGdkFCsACGcFEURvd25sb2FkIFRheG9ub215Z2hkaGhkZAIFDw8WAh4HVmlzaWJsZWhkZGTtXpFTz1TYX73fKLF2ros5Z2CvJ/pDUy88F6s57Qs97Q==",
"__VIEWSTATEGENERATOR": "A250BEAE",
"ctl00$MainContentHolder$viewTabStrip$Download_PDF_2": "Download PDF" }
# url to submit the form
url = 'https://cdr.ffiec.gov/Public/ViewPDFFacsimile.aspx?ds=call&idType=fdiccert&id=8426&date=03312021'
# making request
resp = requests.post(url, headers=headers, data=post_form_data)
# writing the file from response content
with open('file_name.pdf', 'wb') as file:
file.write(resp.content)
查找具有特定fileid 和date 的文档:
此信息在 url 参数中给出:... /ViewPDFFacsimile.aspx?ds=call&idType=fdiccert&id=8426&date=03312021'
您还可以在“网络”选项卡上找到它(在 Chrome 中它称为“查询字符串参数”)。要在请求中传递它,请使用 request.post() 方法的 params 参数。
url_params = {
"ds": "call",
"idType": "fdiccert",
"id": "8426",
"date": "03312021" }
request.post(url, headers=headers, data=post_form_data, params=params)