Python - 请求被 Cloudflare 阻止答案

【问题标题】：Python - Request being blocked by CloudflarePython - 请求被 Cloudflare 阻止
【发布时间】：2018-03-03 18:56:55
【问题描述】：

我正在尝试登录网站。当我查看 print(g.text) 时，我没有返回我期望的网页，而是一个显示“访问前检查您的浏览器”的 cloudflare 页面

import requests
import time

s = requests.Session()
s.get('https://www.off---white.com/en/GB/')

headers = {'Referer': 'https://www.off---white.com/en/GB/login'}

payload = {
    'utf8':'✓',
    'authenticity_token':'',
    'spree_user[email]': 'EMAIL@gmail.com',
    'spree_user[password]': 'PASSWORD',
    'spree_user[remember_me]': '0',
    'commit': 'Login'
}

r = s.post('https://www.off---white.com/en/GB/login', data=payload, headers=headers)

print(r.status_code)

g = s.get('https://www.off---white.com/en/GB/account')

print(g.status_code)
print(g.text)

为什么在我设置会话后会出现这种情况？

【问题讨论】：

标签： python python-3.x

【解决方案1】：

你可能想试试这个：

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper()  # CloudScraper inherits from requests.Session
print scraper.get("http://somesite.com").text  # => "<!DOCTYPE html><html><head>..."

它不需要 Node.js 依赖。所有学分转到this pypi page

【讨论】：

这行得通。谢谢！我认为这应该是公认的答案。
cloudscraper 好像不是完全免费的。在我的情况下，cloudscraper 显示错误消息，例如“检测到 cloudflare v2。在免费版本中不可用”。所以我认为，需要付费才能访问所有 cloudcraper 功能。

【解决方案2】：

这是因为该页面使用了 Cloudflare 的反机器人页面（或 IUAM）。
绕过此检查很难靠您自己解决，因为 Cloudflare 会定期更改其技术。目前，他们检查客户端是否支持可被欺骗的 JavaScript。
我建议使用cfscrape 模块来绕过它。
要安装它，请使用pip install cfscrape。您还需要安装Node.js。
您可以将请求会话传递给create_scraper()，如下所示：

session = requests.Session()
session.headers = ...
scraper = cfscrape.create_scraper(sess=session)

【讨论】：

感谢您的支持 - 我现在将着手实施！
@Pthomas 你实现了吗？你想分享你的实验吗？
哇。这真的对我有很大帮助。谢谢。
@jeremiah 我现在正在尝试。它引发了一些异常here

【解决方案3】：

我遇到了同样的问题，因为他们在 api 中实现了 cloudfare，我是这样解决的

import cloudscraper
import json
scraper = cloudscraper.create_scraper()
r = scraper.get("MY API").text 
y = json.loads(r)
print (y)

【讨论】：

【解决方案4】：

curl 和 hx 避免了这个问题。但是怎么做？我发现，它们默认使用 HTTP/2。但是requests 库只使用了 HTTP/1.1。

因此，对于测试，我安装了 httpx 和 h2 python 库以支持 HTTP/2 请求），如果我这样做，它可以工作：httpx --http2 'https://some.url'。

因此，解决方案是使用支持 http2 的库。例如httpx 和h2

这不是一个完整的解决方案，因为它无助于解决 Cloudflare 的反机器人（“我处于攻击模式”或 IUAM）挑战

【讨论】：

【解决方案5】：

您可以使用此工具抓取任何受 Cloudflare 保护的页面。必须使用 Node.js 才能使代码正常工作。

从此链接下载节点https://nodejs.org/en/

import cfscrape #pip install cfscrape

scraper = cfscrape.create_scraper()
res = scraper.get("https://www.example.com").text
print(res)

【讨论】：