【发布时间】:2020-10-02 21:54:34
【问题描述】:
尝试使用 requests 模块从网站下载一些 pdf 文件,但我不断收到下面列出的此错误。我看到几个帖子,他们提到使用response.content 处理pdf 文件而不是response.text,但它仍然会产生错误。不知道如何解决这个问题。
def scrape_website(link):
try:
print("getting content")
cert = requests.certs.where()
page = requests.get(link, verify=cert, headers={"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"})
print(page)
if ".pdf" in link:
print("the content is a pdf file. downloading..")
return page.content
return page.text
except Exception as x:
print(x)
return ''
statement_page = scrape_website(link)
with open(filepath, 'w+', encoding="utf-8") as f:
print("writing page")
f.write(statement_page)
f.close()
<ipython-input-42-1e4771d32073> in save_html_page(page, path, filename)
13 with open(filepath, 'w+', encoding="utf-8") as f:
14 print("writing page")
---> 15 f.write(page)
16 f.close()
17
TypeError: write() argument must be str, not bytes
【问题讨论】:
-
将
'w+'更改为'wb' -
你读入了
statement_page,然后尝试将page写入文件 -
@RandomDavis,我也试过了,但我仍然不断出错 - 我得到的另一个错误是
a bytes-like object is required, not 'str'
标签: python download python-requests