HTTP Digest/Basic Auth with Python Requests 模块答案

【问题标题】：HTTP Digest/Basic Auth with Python Requests moduleHTTP Digest/Basic Auth with Python Requests 模块
【发布时间】：2014-06-08 20:37:54
【问题描述】：

我的目标是能够从受密码保护的页面解析html/xml data，然后根据该数据（时间戳）将xml commands 发送到另一台设备。我试图访问的页面是由 IP 设备生成的网络服务器。另外，如果用另一种语言更容易完成，请告诉我。我的编程经验很少（一门 C 编程课）

我尝试过使用基本和摘要式身份验证请求。我仍然无法通过身份验证，这阻碍了我进一步了解。

这是我的尝试：

import requests
from requests.auth import HTTPDigestAuth

url='http://myUsername:myPassword@example.com/cgi/metadata.cgi?template=html'
r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'))        
r.status_code

print(r.headers) 
print(r.status_code)

输出：

401 
CaseInsensitiveDict({'Content-Length': '0', 'WWW-Authenticate': 'Digest realm="the realm of device", nonce="23cde09025c589f05f153b81306928c8", qop="auth"', 'Server': 'Device server name'})

我也尝试使用请求 BasicAuth 并获得相同的输出。我已经尝试在 url 中包含 user:pass@ 而不是。虽然当我将该输入输入到我的浏览器时它可以工作。

我认为请求处理了 Digest/BasicAuth 的标头数据，但也许我还需要包含标头？

我使用 Live HTTP Headers(firefox) 并得到了这个：

GET /cgi/metadata.cgi?template=html
HTTP/1.1 
Host: [Device IP] 
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Accept-Language: en-US,en;q=0.5 
Accept-Encoding: gzip, deflate DNT: 1 Connection: keep-alive
HTTP/1.1 401 Unauthorized WWW-Authenticate: Digest realm="Device Realm", nonce="a2333eec4cce86f78016343c48382d21", 
qop="auth" 
Server: Device Server Content-Length: 0

【问题讨论】：

您确定您的代码包含两个 request.get()（第二个不发送任何身份验证信息）吗？提供您实际尝试过的代码（您可以将设备 ip 替换为 example.com，使用等效符号集更改用户/密码（如果您的密码使用空格，则包含它，如果其中包含 "，则包含报价等）。
这是我尝试过的确切代码（除了 url 中的“digest- 和 auth 之间的空格”，当我复制它时格式变得混乱。我尝试过有和没有结束部分的 url 以及 (digest-auth/auth/user/pass/) 我只包括它，因为这是请求文档中的示例中所示的内容，我也将 /user/pass/ 替换为实际凭据我错过了一步？这需要是 .post 而不是 get，我也尝试过，但无济于事。

标签： xml http python-3.x python-requests digest-authentication

【解决方案1】：

这两个请求是独立的：

r = requests.get(url, auth=HTTPDigestAuth('user', 'pass')) 
response = requests.get(url) #XXX <-- DROP IT

第二个请求不发送任何凭据。因此它收到401 Unauthorized http 响应状态也就不足为奇了。

修复它：

使用与浏览器中相同的url。最后删除digest-auth/auth/user/pass。这只是请求文档中的一个示例
打印r.status_code而不是response.status_code，看看是否成功。

为什么要在 url 和 auth 参数中使用用户名/密码？从 url 中删除用户名/密码。要查看发送的请求和响应标头，您可以enable logging/debugging:

import logging
import requests
from requests.auth import HTTPDigestAuth

# these two lines enable debugging at httplib level (requests->urllib3->httplib)
# you will see the REQUEST, including HEADERS and DATA, 
# and RESPONSE with HEADERS but without DATA.
# the only thing missing will be the response.body which is not logged.
try:
    import httplib
except ImportError:
    import http.client as httplib

httplib.HTTPConnection.debuglevel = 1

logging.basicConfig(level=logging.DEBUG) # you need to initialize logging, 
                      # otherwise you will not see anything from requests

# make request
url = 'https://example.com/cgi/metadata.cgi?template=html'
r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'),
                 timeout=10)
print(r.status_code)
print(r.headers)

【讨论】：

我去掉了第二个请求和 url 末尾的额外内容。现在，当我运行代码时，它似乎只是无限期地在 shell 中运行，它永远不会到达 print() 行。我已经更新了代码以显示我现在正在尝试的内容。我也尝试过使用和不使用user:pass@。
如果卡住了，则启用日志记录以查看卡住的位置

【解决方案2】：

import requests
from requests.auth import HTTPDigestAuth

url='https://example.com/cgi/metadata.cgi?template=html'
r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'), verify=False,  stream=True)        


print(r.headers) 
print(r.status_code)

修复了添加stream=True 的问题，因为该页面正在流式传输 xml/html 数据。我的下一个问题是，如何存储/解析恒定的数据流？

我尝试存储在 r.content 中，但它似乎无限期运行（我之前遇到过同样的问题）

【讨论】：