【问题标题】:Selenium can't download correct file in headless modeSelenium 无法在无头模式下下载正确的文件
【发布时间】:2022-09-23 22:55:48
【问题描述】:

即使在执行了下面thread 中建议的 enable_download_headless(driver, path) 之后,文件的下载也是不正确的。虽然非无头版本始终可以正确下载站点文件,但无头版本下载“chargeinfo.xhtml”摘录,这是下载页面链接的最后一个扩展名“https://www.xxxxx. de/xxx/chargeinfo.xhtml\"。有趣的是,当我在非无头模式下调用建议的 enable_download_headless(driver, path) 时,它也会下载“chargeinfo.xhtml”。

此外,在单击下载之前截取屏幕截图会显示与非 headless 相同的网页布局。

非常感谢任何帮助。

这是我的驱动程序设置:

def cd_excerpt_from_uc():
    ## declare driver and allow
    options = webdriver.ChromeOptions()
    ##declaring headless
    options.add_argument(\"--headless\")
    user_agent = \'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36\'
    options.add_argument(f\'user-agent={user_agent}\')
    options.add_argument(\'--ignore-certificate-errors\')
    options.add_argument(\'--allow-running-insecure-content\')
    options.add_argument(\"--window-size=1920,1080\")
    driver_path = \"path/to/chromedriver\"
    driver = webdriver.Chrome(driver_path,options=options)

    ####cause the non headless version to also download \"chargeinfo.xhtml\"
    enable_download_headless(driver, \"/Download/Path/\")

    driver.get(\"https://www.xxxxx.de/xxx/chargeinfo.xhtml\")
    time.sleep(10)
    driver.find_element(\'xpath\', \"//span[@class=\'ui-button-text ui-c\' and contains(text(), \'Download\')]\").click()

def enable_download_headless(browser,download_dir):
    browser.command_executor._commands[\"send_command\"] = (\"POST\", \'/session/$sessionId/chromium/send_command\')
    params = {\'cmd\':\'Page.setDownloadBehavior\', \'params\': {\'behavior\': \'allow\', \'downloadPath\': download_dir}}
    browser.execute(\"send_command\", params)

    标签: python selenium download web-crawler


    【解决方案1】:

    如果有人遇到类似的问题,对我来说,让这个运行的唯一方法是切换到获取请求响应正文。我点击了 selenium 的下载按钮,然后获取了如下响应:

        for request in driver.requests:
        if request.response:
            if request.url == "https://www.xxxxx.de/xxx/chargeinfo.xhtml":
                print(
                    request.url,
                    request.response.status_code,
                    request.response.body
                )
    
                with open('out.pdf', 'wb') as f:
                    f.write(request.response.body)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-05-29
      • 2018-06-11
      • 1970-01-01
      • 2020-02-04
      • 2016-07-29
      • 2020-02-10
      • 2019-12-27
      相关资源
      最近更新 更多