Selenium：如何绕过 Cloudflare答案

【问题标题】：Selenium: How can I bypass CloudflareSelenium：如何绕过 Cloudflare
【发布时间】：2022-09-27 21:57:25
【问题描述】：

我想用 Webdriver 连接到一个站点，但 cloudflare 挑战（不是 hcaptcha）将 selenium 检测为机器人并且没有通过 Cloudflare 挑战。

我在我的代码中使用了这些标志和许多类似的标志，但我还没有能够绕过。

    ChromeOptions options=new ChromeOptions();
    options.setExperimentalOption(\"excludeSwitches\", Collections.singletonList(\"enable-automation\"));
    options.setExperimentalOption(\"useAutomationExtension\", false);
    options.addArguments(\"--disable-blink-features\");
    options.addArguments(\"--disable-blink-features=AutomationControlled\");
    System.setProperty(\"webdriver.chrome.driver\", \"drivers/chromedriver.exe\");
    driver = new ChromeDriver(options);

我的 chrome 版本 104.0.5112.81 和 chrome 驱动程序版本是 104.0.5112.79

如何绕过 Cloudflare？

标签： selenium selenium-webdriver selenium-chromedriver cloudflare bypass

【解决方案1】：

要绕过 cloudflare，您需要在这里获得高分 https://antcpt.com/score_detector/（绿色），这适用于 reCaptcha，但我认为也与 cloudflare 有关。以下是您想尝试的其他一些标志：

不要使用 VPN 或 TOR，如果付费的话，VPN 会很好，但如果你使用 TOR，最后一个节点总是公开的（我不确定，但如果你使用 tor，你不能绕过 cloudflare）
如果您正在更改用户代理，我在您的代码中看不到...我在 python 中使用 selenium 隐形来更改用户代理、渲染器等
```
 stealth(driver,
            languages=["en-US", "en"],
            vendor="Google Inc.",
            platform="Win32",
            webgl_vendor="Intel Inc.",
            renderer="Intel Iris OpenGL Engine",
            fix_hairline=True,
            )
```
这是在https://intoli.com/blog/making-chrome-headless-undetectable/chrome-headless-test.html 上测试您的驱动程序的另一个链接（有一个具有更多功能但我不记得链接...）

__ 3. 您可能需要使用现有的配置文件，所以看起来您不是机器人，您当前的具有大量 cookie 和其他数据的配置文件会很好（我不确定这是否真的有效，但在练习它对我来说似乎有帮助）这里是如何加载一个的链接...How to load default profile in Chrome using Python Selenium Webdriver?

__ 4. 从 chromedriver.exe $cdc_ 中删除标志

__ 5. 可能也检查一下Can a website detect when you are using Selenium with chromedriver?

另请注意，如果网站会检测到机器人行为，那么过多地绕过 cloudflare 会降低您的分数。

【讨论】：

是的，添加配置文件可以解决问题。但这不是一个永久的解决方案。因为它不能总是绕过。例如，当我关闭然后再打开计算机时，它会再次将我检测为机器人，因为数据已被清除。但过了一段时间，并没有将 selenium 视为机器人。
那是因为网站检测到自动查询...也许尝试在它们之间添加等待（我根据网站加载的速度添加一个），或者单击文本等随机元素，这样它看起来不像机器人...我不重新启动电脑时了解数据删除@Selman

【解决方案2】：

一个基于浏览器的自动化工具，例如Selenium不足以绕过 WAF.你会想模仿一个真实的用户被WAF忽略.

在您的情况下，使用undetected chromedriver 和使用fake-useragent python 库的随机用户代理几乎足以绕过WAF。和未检测到的_chromedriver将处理许多内部结构，使 Selenium 看起来像一个“合法”浏览器。例如，它隐藏了 Selenium 在浏览器上公开的所有自动化变量。

import undetected_chromedriver as uc
from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from fake_useragent import UserAgent
ua = UserAgent()

options = webdriver.ChromeOptions()

options.add_argument(f'--user-agent={ua["google chrome"]}')
chrome_path = ChromeDriverManager().install()
chrome_service = Service(chrome_path)
driver = uc.Chrome(headless = True, options=options, service=chrome_service, use_subprocess=True)

url = "https://httpbin.org/headers"

driver.get(url)

time.sleep(5)

#do something
ss = driver.get_screenshot_as_png()
with open("bot-check.png", "wb") as f:
    f.write(ss)

driver.quit()

从此处了解有关 Cloudflare 的更多信息：bypassing Cloudflare。

【讨论】：