通过python图像爬虫获取图像src并将图像保存在目录中答案

【问题标题】：getting imgae src and save images in a directory by python image crawler通过python图像爬虫获取图像src并将图像保存在目录中
【发布时间】：2016-04-26 09:06:22
【问题描述】：

我想创建一个python图像爬虫。

这就是我现在拥有的：

from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'http://blog.pouyacode.net/'
data = urlopen(url)
soup = BeautifulSoup(data, 'html.parser')
img = soup.findAll('img')
print (img)
print ('\n')
print ('****************************')
print ('\n')
for each in img:
    print(img.get('src'))
    print ('\n')

这部分有效：

print (img)
print ('\n')
print ('****************************')
print ('\n')

但是在输出中*****************之后，出现了这个错误：

Traceback (most recent call last):
File "pull.py", line 15, in <module>
print(img.get('src'))
AttributeError: 'ResultSet' object has no attribute 'get'

那么我怎样才能获得所有图像的所有 SRC？以及如何将这些图像保存在目录中？

【问题讨论】：

你可能的意思是使用 each.get('src') 而不是 img.get('src')
是的，抱歉，这是一个小错误！谢谢你。但是第二个，将图像保存在文件夹中呢？

标签： python image web-crawler

【解决方案1】：

这样的？写在脑海中，未经测试

from bs4 import BeautifulSoup
from urllib.request import urlopen
import os

url = 'http://blog.pouyacode.net/'
download_folder = "downloads"

if not os.path.exists(download_folder):
    os.makedirs(download_folder)

data = urlopen(url)
soup = BeautifulSoup(data, 'html.parser')
img = soup.findAll('img')

for each in img:
    url = each.get('src')
    data = urlopen(url)
    with open(os.path.join(download_folder, os.path.basename(url)), "wb") as f:
        f.write(data.read())

【讨论】：