Python下载href，得到源代码而不是pdf文件答案

【问题标题】：Python download href, got the source code instead of a pdf filePython下载href，得到源代码而不是pdf文件
【发布时间】：2019-05-01 09:32:54
【问题描述】：

我正在尝试下载具有以下 href 的 pdf 文件（我更改了一些值，因为 pdf 包含个人信息）

https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/?tx_defacturation%5BdoId%5D=857AD9348B0007984D4B128F1E8BE&cHash=7b3a9f6d109dde87bd1d95b80ca1d

当我在浏览器中通过此 href 时，会直接下载 pdf 文件，但是当我尝试在我的 python 代码中使用 request 时，它只下载源代码

https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/

这是我的代码，我使用 selenium 在网站中查找 href

fact = driver.find_element_by_xpath(url)
href = fact.get_attribute('href')
print(href)      // href is correct here
reply = get(href, Stream=True)
print(reply)     // I got the source code

这是 selenium 找到的 html

<a href="grandcompte/factures/consulter-votre-factue/?tx_defacturation%5BdoId%5D=857AD9348B0007984D4B128F1E8BE&cHash=7b3a9f6d109dde87bd1d95b80ca1d"></a>

我希望你有足够的信息来帮助你，谢谢

【问题讨论】：

您能否重新表述一下您到底在寻找什么？

标签： python selenium get

【解决方案1】：

无法使用您的链接，因为它需要身份验证，因此找到了另一个重定向 pdf 下载的示例。将 Chrome 设置为下载 pdf，而不是显示取自 this StackOverflow answer 的 pdf。

import selenium.webdriver

url = "https://readthedocs.org/projects/selenium-python/downloads/pdf/latest/"

download_dir = 'C:/Dev'
profile = {
    "plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}],
    "download.default_directory": download_dir ,
    "download.extensions_to_open": "applications/pdf"
}

options = selenium.webdriver.ChromeOptions()
options.add_experimental_option("prefs", profile)
driver = selenium.webdriver.Chrome(options=options)

driver.get(url)

通过查看文档，driver.get 方法没有返回任何内容，它只是告诉 webdriver 导航到一个页面。如果您想在将 pdf 保存到文件之前在 Python 中处理它，那么也许可以考虑使用 Requests 或 Robobrowser。

Stream=True 选项不适用于 webdriver.Chrome，因此不确定这是否是您使用的方法，但上述方法应该可以满足您的需求。

【讨论】：