如何使用 python 请求模块通过代理身份验证（需要摘要身份验证）答案

【问题标题】：How to pass proxy-authentication (requires digest auth) by using python requests module如何使用 python 请求模块通过代理身份验证（需要摘要身份验证）
【发布时间】：2012-11-10 11:35:28
【问题描述】：

我之前在使用 Mechanize 模块，现在尝试使用 Requests 模块。
(Python mechanize doesn't work when HTTPS and Proxy Authentication required)

当我访问互联网时，我必须通过代理服务器。
代理服务器需要身份验证。我写了以下代码。

import requests
from requests.auth import HTTPProxyAuth

proxies = {"http":"192.168.20.130:8080"}
auth = HTTPProxyAuth("username", "password")

r = requests.get("http://www.google.co.jp/", proxies=proxies, auth=auth)

当代理服务器需要基本身份验证时，上述代码运行良好。
现在我想知道当代理服务器需要摘要身份验证时我必须做什么。
HTTPProxyAuth 似乎在摘要身份验证中无效（r.status_code 返回 407）。

【问题讨论】：

标签： python digest-authentication python-requests proxy-server

【解决方案1】：

我编写了一个 Python 模块（here 可用），它可以使用摘要方案通过 HTTP 代理进行身份验证。它在连接到 HTTPS 网站（通过猴子补丁）时工作，并且还允许对网站进行身份验证。这应该适用于 Python 2 和 3 的最新 requests 库。

以下示例通过 HTTP 代理 1.2.3.4:8080 获取网页 https://httpbin.org/ip，这需要使用用户名 user1 和密码 password1 进行 HTTP 摘要身份验证：

import requests
from requests_digest_proxy import HTTPProxyDigestAuth

s = requests.Session()
s.proxies = {
        'http': 'http://1.2.3.4:8080/',
        'https': 'http://1.2.3.4:8080/'
}
s.auth = HTTPProxyDigestAuth('user1', 'password1')

print(s.get('https://httpbin.org/ip').text)

如果网站需要某种HTTP身份验证，可以这样指定给HTTPProxyDigestAuth构造函数：

# HTTP Basic authentication for website
s.auth = HTTPProxyDigestAuth(('user1', 'password1'),
        auth=requests.auth.HTTPBasicAuth('user1', 'password0'))
print(s.get('https://httpbin.org/basic-auth/user1/password0').text))

# HTTP Digest authentication for website
s.auth = HTTPProxyDigestAuth(('user1', 'password1'),,
        auth=requests.auth.HTTPDigestAuth('user1', 'password0'))
print(s.get('https://httpbin.org/digest-auth/auth/user1/password0').text)

【讨论】：

我尝试使用这个模块，它给了我一个 407 错误。如果这仍然有效，请告诉我
@Yash.S.Narang 脚本仍然适用于我最新的 Python 和请求版本。如果代理凭据错误，您仍然可以获得 407，因此请检查一下。
非常感谢您分享您的 Python 模块！
这很好用，并且在连接到 requests-toolbelt 失败的 HTTPS 站点时也很好用。非常感谢！

【解决方案2】：

这对我有用。实际上，不了解此解决方案中 user:password 的安全性：

import requests
import os

http_proxyf = 'http://user:password@proxyip:port'
os.environ["http_proxy"] = http_proxyf
os.environ["https_proxy"] = http_proxyf

sess = requests.Session()
# maybe need sess.trust_env = True
print(sess.get('https://some.org').text)

【讨论】：

这仅适用于 Basic Auth，不适用于 Digest Auth，这是 OP 要求的

【解决方案3】：

此 sn-p 适用于两种类型的请求（http 和 https）。在当前版本的请求 (2.23.0) 上测试。

import re
import requests
from requests.utils import get_auth_from_url
from requests.auth import HTTPDigestAuth
from requests.utils import parse_dict_header
from urllib3.util import parse_url

def get_proxy_autorization_header(proxy, method):
    username, password = get_auth_from_url(proxy)
    auth = HTTPProxyDigestAuth(username, password)
    proxy_url = parse_url(proxy)
    proxy_response = requests.request(method, proxy_url, auth=auth)
    return proxy_response.request.headers['Proxy-Authorization']


class HTTPSAdapterWithProxyDigestAuth(requests.adapters.HTTPAdapter):
    def proxy_headers(self, proxy):
        headers = {}
        proxy_auth_header = get_proxy_autorization_header(proxy, 'CONNECT')
        headers['Proxy-Authorization'] = proxy_auth_header
        return headers


class HTTPAdapterWithProxyDigestAuth(requests.adapters.HTTPAdapter):
    def proxy_headers(self, proxy):
        return {}

    def add_headers(self, request, **kwargs):
        proxy = kwargs['proxies'].get('http', '')
        if proxy:
            proxy_auth_header = get_proxy_autorization_header(proxy, request.method)
            request.headers['Proxy-Authorization'] = proxy_auth_header



class HTTPProxyDigestAuth(requests.auth.HTTPDigestAuth):

    def init_per_thread_state(self):
        # Ensure state is initialized just once per-thread
        if not hasattr(self._thread_local, 'init'):
            self._thread_local.init = True
            self._thread_local.last_nonce = ''
            self._thread_local.nonce_count = 0
            self._thread_local.chal = {}
            self._thread_local.pos = None
            self._thread_local.num_407_calls = None

    def handle_407(self, r, **kwargs):
        """
        Takes the given response and tries digest-auth, if needed.
        :rtype: requests.Response
        """

        # If response is not 407, do not auth
        if r.status_code != 407:
            self._thread_local.num_407_calls = 1
            return r

        s_auth = r.headers.get('proxy-authenticate', '')

        if 'digest' in s_auth.lower() and self._thread_local.num_407_calls < 2:
            self._thread_local.num_407_calls += 1
            pat = re.compile(r'digest ', flags=re.IGNORECASE)
            self._thread_local.chal = requests.utils.parse_dict_header(
                    pat.sub('', s_auth, count=1))

            # Consume content and release the original connection
            # to allow our new request to reuse the same one.
            r.content
            r.close()
            prep = r.request.copy()
            requests.cookies.extract_cookies_to_jar(prep._cookies, r.request, r.raw)
            prep.prepare_cookies(prep._cookies)

            prep.headers['Proxy-Authorization'] = self.build_digest_header(prep.method, prep.url)
            _r = r.connection.send(prep, **kwargs)
            _r.history.append(r)
            _r.request = prep

            return _r

        self._thread_local.num_407_calls = 1
        return r

    def __call__(self, r):
        # Initialize per-thread state, if needed
        self.init_per_thread_state()
        # If we have a saved nonce, skip the 407
        if self._thread_local.last_nonce:
            r.headers['Proxy-Authorization'] = self.build_digest_header(r.method, r.url)

        r.register_hook('response', self.handle_407)
        self._thread_local.num_407_calls = 1

        return r


session = requests.Session()
session.proxies = {
    'http': 'http://username:password@proxyhost:proxyport',
    'https':  'http://username:password@proxyhost:proxyport'
}
session.trust_env = False

session.mount('http://', HTTPAdapterWithProxyDigestAuth())
session.mount('https://', HTTPSAdapterWithProxyDigestAuth())

response_http = session.get("http://ww3.safestyle-windows.co.uk/the-secret-door/")
print(response_http.status_code)

response_https = session.get("https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests")
print(response_https.status_code)

一般情况下，使用协议 HTTPS 连接时，代理授权的问题也与其他类型的身份验证（ntlm、kerberos）有关。尽管存在大量问题（自 2013 年以来，也许还有一些我没有找到的更早的问题）：

请求中：Digest Proxy Auth、NTLM Proxy Auth、Kerberos Proxy Auth

在 urllib3 中：NTLM Proxy Auth，NTLM Proxy Auth

还有很多其他的，问题仍然没有解决。

问题的根源在于模块httplib(python2)/http.client(python3)的函数_tunnel。如果连接尝试不成功，它会引发OSError，而不返回响应代码（在我们的例子中为 407）和构建自动化标头所需的其他数据。 Lukasa给出了解释here。只要 urllib3（或请求）的维护者没有解决方案，我们就只能使用各种变通方法（例如，使用 @Tey' 的 approach 或执行类似 this 的操作）。在我的变通方法中，我们通过向代理服务器发送请求并处理收到的响应来预先准备必要的授权数据。

【讨论】：

如果将需要在可能需要基本或摘要代理身份验证的两种环境中运行的应用程序放在一起，是否有一种先发制人的方法来确定您是否需要创建/使用基于摘要或基本的身份验证？
如果这是网站上的身份验证，那么有办法。您发送一个请求，接收一个带有 401 代码的响应以及使用哪种类型的身份验证的指示。收到此信息后，您可以根据需要形成带有身份验证数据的标头，并使用此标头发送新请求。但是如果我们在请求中讨论代理身份验证，那么这将不起作用。如果我们发送一个包含不正确数据的请求，我们将收到一个 OSError。并且在错误文本中没有关于身份验证类型的信息。
谢谢！ ...那是我的感觉/恐惧。只是我，还是正在处理代理，特别是当他们使用身份验证（这是公司代理环境中最常见的场景）时，请求中的真正问题？作为一个替代概念 - 至少对于 Windows 机器来说 - 是否有任何更简单的方法可以搭载操作系统中配置的任何内容并使用它？
是的，我觉得这确实是requests的问题。这个问题的根源在于python 的库http。至于通过 Windows 代理发送 http-request 的简单方法 - 我设法通过 curl 为具有身份验证 Kerberos 的代理执行此操作，仅表明在代理上使用了这种类型的身份验证就足够了 -服务器，然后curl 自己找到了存储在系统中的所有必要凭据。
没错。我还在 Powershell 中编写了一个脚本，现在我将其替换为 Python。在 powershell 中它是微不足道的 - 它还找出了正在使用的 auth 方法并使用它。 ...简单易行。这在请求中是如此不受支持的混乱情况并不好。

【解决方案4】：

这是一个不适用于 http 基本身份验证的答案 - 例如组织内的透明代理。

import requests

url      = 'https://someaddress-behindproxy.com'
params   = {'apikey': '123456789'}                     #if you need params
proxies  = {'https': 'https://proxyaddress.com:3128'}  #or some other port
response = requests.get(url, proxies=proxies, params=params)

我希望这对某人有所帮助。

【讨论】：

【解决方案5】：

无需自己实现！ 在大多数情况下

Requests 内置了对代理的支持，用于基本身份验证：

proxies = { 'https' : 'https://user:password@proxyip:port' } 
r = requests.get('https://url', proxies=proxies)

在docs 上查看更多信息

或者如果您需要摘要身份验证 HTTPDigestAuth 可能会有所帮助。
或者您可能需要尝试像 yutaka2487 那样扩展它。

注意：必须使用代理服务器的ip而不是它的名字！

【讨论】：

它仅适用于基本身份验证，而不适用于 OP 要求的摘要身份验证。
它不会起作用，因为HTTPDigestAuth 只支持与最终网站服务器（WWW-Authenticate/Authorization 标头，401 状态）的身份验证，而不支持代理服务器（Proxy-Authenticate/Proxy-Authorization标头，407 状态）。您需要一个类似于@yutaka2487 给出的解决方案，但它仅适用于通过代理联系 HTTP 服务器，而不是 HTTPS 服务器，因为 requests/urllib3 后端在隧道 HTTPS 连接时不会报告代理错误，所以 Digest auth无法正常工作。
@Tey' 只考虑or you might need try to extend it like yutaka2487 did bellow

【解决方案6】：

import requests
import os


# in my case I had to add my local domain
proxies = {
  'http': 'proxy.myagency.com:8080',
  'https': 'user@localdomain:password@proxy.myagency.com:8080',
}


r=requests.get('https://api.github.com/events', proxies=proxies)
print(r.text)

【讨论】：

这仅适用于 Basic Auth，不适用于 Digest，因此不回答 OP 问题

【解决方案7】：

对于那些仍然在这里结束的人，似乎有一个名为 requests-toolbelt 的项目具有此功能以及其他常见但未内置的请求功能。

https://toolbelt.readthedocs.org/en/latest/authentication.html#httpproxydigestauth

【讨论】：

看起来这在连接到 HTTPS 网站时不起作用，只有 HTTP。

【解决方案8】：

我编写了可以在代理身份验证中使用的类（基于摘要身份验证）。
我从 requests.auth.HTTPDigestAuth 中借用了几乎所有的代码。

import requests
import requests.auth

class HTTPProxyDigestAuth(requests.auth.HTTPDigestAuth):
    def handle_407(self, r):
        """Takes the given response and tries digest-auth, if needed."""

        num_407_calls = r.request.hooks['response'].count(self.handle_407)

        s_auth = r.headers.get('Proxy-authenticate', '')

        if 'digest' in s_auth.lower() and num_407_calls < 2:

            self.chal = requests.auth.parse_dict_header(s_auth.replace('Digest ', ''))

            # Consume content and release the original connection
            # to allow our new request to reuse the same one.
            r.content
            r.raw.release_conn()

            r.request.headers['Authorization'] = self.build_digest_header(r.request.method, r.request.url)
            r.request.send(anyway=True)
            _r = r.request.response
            _r.history.append(r)

            return _r

        return r

    def __call__(self, r):
        if self.last_nonce:
            r.headers['Proxy-Authorization'] = self.build_digest_header(r.method, r.url)
        r.register_hook('response', self.handle_407)
        return r

用法：

proxies = {
    "http" :"192.168.20.130:8080",
    "https":"192.168.20.130:8080",
}
auth = HTTPProxyDigestAuth("username", "password")

# HTTP
r = requests.get("http://www.google.co.jp/", proxies=proxies, auth=auth)
r.status_code # 200 OK

# HTTPS
r = requests.get("https://www.google.co.jp/", proxies=proxies, auth=auth)
r.status_code # 200 OK

【讨论】：

我收到错误：“HTTPProxyDigestAuth”对象没有属性“last_nonce”。当我尝试使用您的课程时。我会调查的。
现在不需要自己实现，requests 现在已经内置了对代理的支持，例如proxies = { 'https' : 'https://user:password@ip:port' } ; r = requests.get('https://url', proxies=proxies) 见docs.python-requests.org/en/latest/user/advanced
@BurnsBA @MattClimbs @yutaka 我可以确认在 Python 3 中使用 https 和 user:password@ip:port 的请求效果很好。
这个 sn-p 不再适用于最新版本的请求，因为它现在将请求参数放在线程本地存储中。此外，即使在修复代码之后，this will not work for connecting to HTTPS website through proxy.
所以@Tey' 有没有办法解决这个问题？（使用代理中的摘要）

【解决方案9】：

您可以通过使用requests.auth.HTTPDigestAuth 而不是requests.auth.HTTPProxyAuth 来使用摘要式身份验证

【讨论】：

我想通过代理身份验证（基于摘要身份验证）。这与通常的摘要身份验证不同。所以我需要扩展 HTTPDigestAuth（见下文）。