python异步图片下载（多个url）答案

【问题标题】：python asyncronous images download (multiple urls)python异步图片下载（多个url）
【发布时间】：2019-02-25 01:20:02
【问题描述】：

我正在学习 Python 4/5 个月，这是我从头开始构建的第三个项目，但我无法自己解决这个问题。

此脚本会为每个给定的网址下载 1 张图片。我无法找到有关如何在此脚本中实现线程池执行器或异步的解决方案。我无法弄清楚如何将带有图像编号的 url 链接到保存图像部分。我构建了我需要下载的所有 url 的字典，但我如何用正确的名称实际保存图像？还有什么建议吗？

PS。目前出现的网址只是假网址。

同步版本：

导入请求导入参数解析重新进口导入操作系统导入日志从 bs4 导入 BeautifulSoup 解析器 = argparse.ArgumentParser() parser.add_argument("-n", "--num", help="书号", type=int, required=True) parser.add_argument("-p", dest=r"path_name", default=r"F:\Users\123", help="保存到目录", ) args = parser.parse_args() logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', 级别=记录。错误）记录器 = logging.getLogger(__name__) def get_parser(url_c): url = f'https://test.net/g/{url_c}/1' logger.info(f'主网址：{url_c}') responce = requests.get(url, timeout=5) # 超时会引发异常如果 response.status_code == 200： page = requests.get(url, timeout=5).content 汤= BeautifulSoup（页面，'html.parser'）回汤别的： response.raise_for_status() def get_locators(soup): # 获取 get_parser # 提取第一页/最后一页编号 first = int(soup.select_one('span.current').string) logger.info(f'首页：{first}') 最后 = int(soup.select_one('span.num-pages').string) + 1 # 提取 img_code 和扩展名链接 = soup.find('img', {'class': 'fit-horizontal'}).attrs["src"] logger.info(f'定位码：{link}') 代码 = re.search('画廊。([0-9]+)\/.\.(\w{3})', 链接) book_code = code.group(1) # 内部代码 extension = code.group(2) # png 或 jpg # 提取目录书名模式 = re.compile('漂亮":"(.*)"') 找到=soup.find（'脚本'，文本=模式） string = pattern.search(found.text).group(1) dir_name = string.split('"')[0] logger.info(f'目录名：{dir_name}') logger.info(f'隐藏代码：{book_code}') print(f'扩展名：{扩展名}') print(f'总页数：{last}') 打印（f''）返回{'first_p'：首先， 'last_p'：最后一个， 'book_code': book_code, 'ext'：扩展名， “目录”：目录名称 } def setup_download_dir(path, dir): # (args.path_name, locator['dir']) # 如果文件夹不存在则创建文件夹文件路径 = os.path.join(f'{path}\{dir}') 如果不是 os.path.exists(filepath): 尝试： os.makedirs（文件路径） print(f'目录创建于：{filepath}') 除了 OSError 作为错误： print(f"无法创建 {filepath}: {err}") 返回文件路径 def main（定位器，文件路径）：对于范围内的 image_n（定位器 ['first_p']，定位器 ['last_p']）： url = f"https://i.test.net/galleries/{locator['book_code']}/{image_n}.{locator['ext']}" logger.info(f'Url Img: {url}') 响应 = requests.get(url, timeout=3) 如果 response.status_code == 200： img_data = requests.get(url, timeout=3).content 别的： responce.raise_for_status() # raise exepetion 使用 open((os.path.join(filepath, f"{image_n}.{locator['ext']}")), 'wb') 作为处理程序： handler.write(img_data) # 写入图片打印（f'Img {image_n} - 完成'）如果 __name__ == '__main__'：尝试： locator = get_locators(get_parser(args.num)) # args.num ex. 241461 主要（定位器，setup_download_dir（args.path_name，定位器['dir']））除了键盘中断： print(f'程序中止...' + '\n')

网址列表：

def img_links（定位器）： image_url = [] 对于范围内的 num（定位器 ['first_p']，定位器 ['last_p']）： url = f"https://i.test.net/galleries/{locator['book_code']}/{num}.{locator['ext']}" image_url.append(url) logger.info(f'Url List: {image_url}') 返回 image_url

【问题讨论】：

标签： python python-3.x asynchronous python-multithreading imagedownload

【解决方案1】：

我在书 fluent python 中找到了解决方案。这里是sn-p：

def download_many（cc_list，base_url，详细，concur_req）：计数器 = 集合。计数器（）以 futures.ThreadPoolExecutor(max_workers=concur_req) 作为执行者： to_do_map = {} 对于排序中的 cc（cc_list）：未来 = executor.submit(download_one, cc, base_url, 详细) to_do_map[未来] = cc done_iter = futures.as_completed(to_do_map) 如果不详细： done_iter = tqdm.tqdm(done_iter, total=len(cc_list)) 对于 done_iter 的未来：尝试： res = future.result() 除了 requests.exceptions.HTTPError 作为 exc： error_msg = 'HTTP {res.status_code} - {res.reason}' error_msg = error_msg.format(res=exc.response) 除了 requests.exceptions.ConnectionError 作为 exc： error_msg = '连接错误' 别的： error_msg = '' 状态 = res.status 如果错误消息：状态 = HTTPStatus.error 计数器[状态] += 1 如果详细和 error_msg： cc = to_do_map[未来] print('*** {} 错误：{}'.format(cc, error_msg)) 退货柜台

【讨论】：