【发布时间】:2026-02-18 23:20:03
【问题描述】:
我重写了程序,因此它可以在 URL 被重定向后工作,但我无法保存文件,也就是在下载文件夹中查看它。这是网站https://fraser.stlouisfed.org/title/1339#518552
from bs4 import BeautifulSoup
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
import re
from urllib import request
import requests
import time
#access the website
http = urllib3.PoolManager()
url='https://fraser.stlouisfed.org/title/1339#518552/title/1339/item/558539'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)
download_links=[]
#i found the part of the name files share and tried to append with that
for link in soup.find_all('a', attrs={'href': re.compile("/title/1339/item/5")}):
download_links.append('https://fraser.stlouisfed.org/'+link.get('href'))
# this part deals with redirected page
#I am trying to make it work for only one link first.
response_two= http.request('GET', download_links[1])
soup = BeautifulSoup(response_two.data)
for link in soup.find_all('a', attrs={'href': re.compile("/files/docs/publications/cfc/")}):
urlfin="https://fraser.stlouisfed.org/" + link['href']
request.urlretrieve(urlfin)
程序运行,但没有下载任何内容,谁能帮忙找出问题所在?
【问题讨论】:
标签: python-3.x pdf web-scraping download