【问题标题】:Downloading all the pdf file from website with Python 3 part.2使用 Python 3 part.2 从网站下载所有 pdf 文件
【发布时间】:2026-02-18 23:20:03
【问题描述】:

我重写了程序,因此它可以在 URL 被重定向后工作,但我无法保存文件,也就是在下载文件夹中查看它。这是网站https://fraser.stlouisfed.org/title/1339#518552

       from bs4 import BeautifulSoup
       import urllib3
       urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
       import re
       from urllib import request
       import requests
       import time

       #access the website
       http = urllib3.PoolManager()
       url='https://fraser.stlouisfed.org/title/1339#518552/title/1339/item/558539'
       response = http.request('GET', url)
       soup = BeautifulSoup(response.data)

       download_links=[]
       #i found the part of the name files share and tried to append with that
       for link in soup.find_all('a', attrs={'href': re.compile("/title/1339/item/5")}):
       download_links.append('https://fraser.stlouisfed.org/'+link.get('href'))

       # this part deals with redirected page
       #I am trying to make it work for only one link first.
       response_two= http.request('GET', download_links[1])
       soup = BeautifulSoup(response_two.data)


       for link in soup.find_all('a', attrs={'href': re.compile("/files/docs/publications/cfc/")}):
             urlfin="https://fraser.stlouisfed.org/" + link['href']
             request.urlretrieve(urlfin)

程序运行,但没有下载任何内容,谁能帮忙找出问题所在?

【问题讨论】:

    标签: python-3.x pdf web-scraping download


    【解决方案1】:

    只需选择流式传输的文件名。

    import urllib.request
    urllib.request.urlretrieve(urlfin, "test.pdf")
    

    【讨论】:

      最近更新 更多