【发布时间】:2018-12-31 07:34:21
【问题描述】:
我正在尝试用 Python 编写一个脚本,用于下载该站点上每天更新的图像:
https://apod.nasa.gov/apod/astropix.html
我试图关注这篇文章的热门评论: How to extract and download all images from a website using beautifulSoup?
所以,这就是我的代码目前的样子:
import re
import requests
from bs4 import BeautifulSoup
site = 'https://apod.nasa.gov/apod/astropix.html'
response = requests.get(site)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
urls = [img['src'] for img in img_tags]
for url in urls:
filename = re.search(r'/([\w_-]+[.](jpg|gif|png))$', url)
with open(filename.group(1), 'wb') as f:
if 'http' not in url:
# sometimes an image source can be relative
# if it is provide the base url which also happens
# to be the site variable atm.
url = '{}{}'.format(site, url)
response = requests.get(url)
f.write(response.content)
但是,当我运行我的程序时,我得到了这个错误:
Traceback on line 17
with open(filename.group(1), 'wb' as f:
AttributeError: 'NoneType' object has no attribute 'group'
看来我的正则表达式可能有问题?
【问题讨论】:
标签: python