Selenium 无法在无头模式下下载正确的文件答案

【问题标题】：Selenium can't download correct file in headless modeSelenium 无法在无头模式下下载正确的文件
【发布时间】：2022-09-23 22:55:48
【问题描述】：

即使在执行了下面thread 中建议的 enable_download_headless(driver, path) 之后，文件的下载也是不正确的。虽然非无头版本始终可以正确下载站点文件，但无头版本下载“chargeinfo.xhtml”摘录，这是下载页面链接的最后一个扩展名“https://www.xxxxx. de/xxx/chargeinfo.xhtml\"。有趣的是，当我在非无头模式下调用建议的 enable_download_headless(driver, path) 时，它也会下载“chargeinfo.xhtml”。

此外，在单击下载之前截取屏幕截图会显示与非 headless 相同的网页布局。

非常感谢任何帮助。

这是我的驱动程序设置：

def cd_excerpt_from_uc():
    ## declare driver and allow
    options = webdriver.ChromeOptions()
    ##declaring headless
    options.add_argument(\"--headless\")
    user_agent = \'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36\'
    options.add_argument(f\'user-agent={user_agent}\')
    options.add_argument(\'--ignore-certificate-errors\')
    options.add_argument(\'--allow-running-insecure-content\')
    options.add_argument(\"--window-size=1920,1080\")
    driver_path = \"path/to/chromedriver\"
    driver = webdriver.Chrome(driver_path,options=options)

    ####cause the non headless version to also download \"chargeinfo.xhtml\"
    enable_download_headless(driver, \"/Download/Path/\")

    driver.get(\"https://www.xxxxx.de/xxx/chargeinfo.xhtml\")
    time.sleep(10)
    driver.find_element(\'xpath\', \"//span[@class=\'ui-button-text ui-c\' and contains(text(), \'Download\')]\").click()

def enable_download_headless(browser,download_dir):
    browser.command_executor._commands[\"send_command\"] = (\"POST\", \'/session/$sessionId/chromium/send_command\')
    params = {\'cmd\':\'Page.setDownloadBehavior\', \'params\': {\'behavior\': \'allow\', \'downloadPath\': download_dir}}
    browser.execute(\"send_command\", params)

标签： python selenium download web-crawler

【解决方案1】：

如果有人遇到类似的问题，对我来说，让这个运行的唯一方法是切换到获取请求响应正文。我点击了 selenium 的下载按钮，然后获取了如下响应：

    for request in driver.requests:
    if request.response:
        if request.url == "https://www.xxxxx.de/xxx/chargeinfo.xhtml":
            print(
                request.url,
                request.response.status_code,
                request.response.body
            )

            with open('out.pdf', 'wb') as f:
                f.write(request.response.body)

【讨论】：