你可以使用os.path.join加入路径组件,os.path.exists检查目录是否存在,os.makedirs创建目录。
这个例子结合了方法:
import os
import requests
from bs4 import BeautifulSoup
form = "Form W-2"
URL = (
"https://apps.irs.gov/app/picklist/list/priorFormPublication."
"html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&criteria=formNumber&value="
"" + form + "&isDescending=false"
)
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
for table_element in soup.select(".picklist-dataTable tr:has(td)"):
form_number = table_element.find("td", class_="LeftCellSpacer")
u = form_number.a["href"]
path = os.path.join(form, u.split("/")[-1])
if not os.path.exists(form):
os.makedirs(form)
print(f"Saving {u=} to {path=}")
with open(path, "wb") as f_out:
f_out.write(requests.get(u).content)
打印:
Saving u='https://www.irs.gov/pub/irs-prior/fw2p--1990.pdf' to path='Form W-2/fw2p--1990.pdf'
Saving u='https://www.irs.gov/pub/irs-prior/fw2p--1989.pdf' to path='Form W-2/fw2p--1989.pdf'
Saving u='https://www.irs.gov/pub/irs-prior/fw2p--1988.pdf' to path='Form W-2/fw2p--1988.pdf'
...and so on.
并将文档保存到目录。
编辑:用不同的文件名保存:
import os
import requests
from bs4 import BeautifulSoup
form = "Form W-2"
URL = (
"https://apps.irs.gov/app/picklist/list/priorFormPublication."
"html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&criteria=formNumber&value="
"" + form + "&isDescending=false"
)
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
for table_element in soup.select(".picklist-dataTable tr:has(td)"):
form_number = table_element.find("td", class_="LeftCellSpacer")
form_year = table_element.find("td", class_="EndCellSpacer")
u = form_number.a["href"]
p = "{}-{}.pdf".format(
form_number.get_text(strip=True), form_year.get_text(strip=True)
)
path = os.path.join(form, p)
if not os.path.exists(form):
os.makedirs(form)
print(f"Saving {u=} to {path=}")
with open(path, "wb") as f_out:
f_out.write(requests.get(u).content)
这会将文件保存为:
Saving u='https://www.irs.gov/pub/irs-prior/fw2p--1990.pdf' to path='Form W-2/Form W-2 P-1990.pdf'
Saving u='https://www.irs.gov/pub/irs-prior/fw2p--1989.pdf' to path='Form W-2/Form W-2 P-1989.pdf'
Saving u='https://www.irs.gov/pub/irs-prior/fw2p--1988.pdf' to path='Form W-2/Form W-2 P-1988.pdf'
Saving u='https://www.irs.gov/pub/irs-prior/fw2p--1987.pdf' to path='Form W-2/Form W-2 P-1987.pdf'
Saving u='https://www.irs.gov/pub/irs-prior/fw2p--1986.pdf' to path='Form W-2/Form W-2 P-1986.pdf'
...