Python - 使用 Selenium 下载 PDF 并保存到磁盘答案

【问题标题】：Python - Downloading PDF and saving to disk using SeleniumPython - 使用 Selenium 下载 PDF 并保存到磁盘
【发布时间】：2020-12-17 04:09:48
【问题描述】：

我正在创建一个从网站下载 PDF 并将其保存到磁盘的应用程序。我了解请求模块能够做到这一点，但无法处理下载背后的逻辑（文件大小、进度、剩余时间等）。

到目前为止，我已经使用 selenium 创建了该程序，并希望最终将其合并到 GUI Tkinter 应用程序中。

处理下载、跟踪和最终创建进度条的最佳方式是什么？

这是我目前的代码：

from selenium import webdriver
from time import sleep 
import requests

import secrets

class manual_grabber():
    """ A class creating a manual downloader for the Roger Technology website """
    def __init__(self):
    """ Initialize attributes of manual grabber """
    self.driver = webdriver.Chrome('\\Users\\Joel\\Desktop\\Python\\manual_grabber\\chromedriver.exe')

def login(self):
    """ Function controlling the login logic """
    self.driver.get('https://rogertechnology.it/en/b2b')

    sleep(1)

    # Locate elements and enter login details
    user_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[6]')
    user_in.send_keys(secrets.username)   

    pass_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[7]')
    pass_in.send_keys(secrets.password)

    enter_button = self.driver.find_element_by_xpath('/html/body/div[2]/form/div/input')
    enter_button.click()

    # Click Self Service Area button
    self_service_button = self.driver.find_element_by_xpath('//*[@id="bs-example-navbar-collapse-1"]/ul/li[1]/a')
    self_service_button.click()

def download_file(self):
    """Access file tree and navigate to PDF's and download"""
    # Wait for all elements to load 
    sleep(3)

    # Find and switch to iFrame
    frame = self.driver.find_element_by_xpath('//*[@id="siteOutFrame"]/iframe')
    self.driver.switch_to.frame(frame)

    # Find and click tech manuals button 
    tech_manuals_button = self.driver.find_element_by_xpath('//*[@id="fileTree_1"]/ul/li/ul/li[6]/a')
    tech_manuals_button.click()


bot = manual_grabber()
bot.login()
bot.download_file()

总而言之，我想让这段代码在网站上下载 PDF，将它们存储在特定目录中（以 JQuery 文件树中的父文件夹命名）并跟踪进度（文件大小、时间剩余的等等）

这是 DOM：

我希望这是足够的信息。有更多需要请告诉我。

【问题讨论】：

问题解决了吗？
@AzyCrw4282 我没有机会尝试这个，因为这是我在工作中正在做的一个项目，我还没有在办公室。我会试一试，看看我们进展如何。

标签： python selenium web-scraping download

【解决方案1】：

我建议为此使用tqdm 和request 模块。这是一个示例代码，可以有效地完成下载和更新进度条的艰巨任务。

from tqdm import tqdm
import requests

url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes= int(response.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data)) #change this to your widget in tkinter
        file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
    print("ERROR, something went wrong")

block_size 是您的文件大小，time-remaining 可以用每秒相对于剩余块大小执行的迭代次数来计算。这是另一种选择 - How to measure download speed and progress using requests?

【讨论】：