从 url 列表下载图片答案

【问题标题】：Download Images from list of urls从 url 列表下载图片
【发布时间】：2017-03-18 18:23:55
【问题描述】：

我有一个文本文件中的 url 列表。我希望将图像下载到特定文件夹，我该怎么做。chrome 或任何其他程序中是否有可用的插件从 url 下载图像

【问题讨论】：

标签： image google-chrome

【解决方案1】：

在您的机器中创建一个文件夹。
将图片 URL 的文本文件放在文件夹中。
cd 到那个文件夹。
使用wget -i images.txt
您会在文件夹中找到所有下载的文件。

【讨论】：

完美！使用 Linux 的另一项优势
请注意，在 Windows 中，当从 Powershell 启动 wget 时，wget 将由内部 Windows 命令别名，其行为会略有不同。如果您需要使用原始 wget，只需打开一个普通的 cmd.exe shell 并从那里启动它。
我必须先brew install wget，但在那之后，这很容易！非常感谢！

【解决方案2】：

这需要做成一个有错误处理的函数，但它会重复下载图像用于图像分类项目

    import requests

    urls = pd.read_csv('cat_urls.csv') #save the url list as a dataframe

    rows = []

    for index, i in urls.iterrows():
        rows.append(i[-1])

    counter = 0

    for i in rows:
    

    file_name = 'cat' + str(counter) + '.jpg'
    
        print(file_name)
        response = requests.get(i)
        file = open(file_name, "wb")
        file.write(response.content)
        file.close()
        counter += 1

【讨论】：

【解决方案3】：

import os
import time
import sys
import urllib
from progressbar import ProgressBar

def get_raw_html(url):
    version = (3,0)
    curr_version = sys.version_info
    if curr_version >= version:     #If the Current Version of Python is 3.0 or above
        import urllib.request    #urllib library for Extracting web pages
        try:
            headers = {}
            headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
            request = urllib.request.Request(url, headers = headers)
            resp = urllib.request.urlopen(request)
            respData = str(resp.read())
            return respData
        except Exception as e:
            print(str(e))
    else:                        #If the Current Version of Python is 2.x
        import urllib2
        try:
            headers = {}
            headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
            request = urllib2.Request(url, headers = headers)
            try:
                response = urllib2.urlopen(request)
            except URLError: # Handling SSL certificate failed
                context = ssl._create_unverified_context()
                response = urlopen(req,context=context)
            #response = urllib2.urlopen(req)
            raw_html = response.read()
            return raw_html    
        except:
            return"Page Not found"


def next_link(s):
    start_line = s.find('rg_di')
    if start_line == -1:    #If no links are found then give an error!
        end_quote = 0
        link = "no_links"
        return link, end_quote
    else:
        start_line = s.find('"class="rg_meta"')
        start_content = s.find('"ou"',start_line+1)
        end_content = s.find(',"ow"',start_content+1)
        content_raw = str(s[start_content+6:end_content-1])
        return content_raw, end_content


def all_links(page):
    links = []
    while True:
        link, end_content = next_link(page)
        if link == "no_links":
            break
        else:
            links.append(link)      #Append all the links in the list named 'Links'
            #time.sleep(0.1)        #Timer could be used to slow down the request for image downloads
            page = page[end_content:]
    return links

def download_images(links, search_keyword):

    choice = input("Do you want to save the links? [y]/[n]: ")
    if choice=='y' or choice=='Y':
        #write all the links into a test file. 
        f = open('links.txt', 'a')        #Open the text file called links.txt
        for link in links:
            f.write(str(link))
            f.write("\n")
        f.close()   #Close the file 
    num = input("Enter number of images to download (max 100): ")
    counter = 1
    errors=0
    search_keyword = search_keyword.replace("%20","_")
    directory = search_keyword+'/'
    if not os.path.isdir(directory):
        os.makedirs(directory)
    pbar = ProgressBar()
    for link in pbar(links):
        if counter<=int(num):
            file_extension = link.split(".")[-1]
            filename = directory + str(counter) + "."+ file_extension
            #print ("Downloading image: " + str(counter)+'/'+str(num))
            try:
                urllib.request.urlretrieve(link, filename)
            except IOError:
                errors+=1
                #print ("\nIOError on Image" + str(counter))
            except urllib.error.HTTPError as e:
                errors+=1
                #print ("\nHTTPError on Image"+ str(counter))
            except urllib.error.URLError as e:
                errors+=1
                #print ("\nURLError on Image" + str(counter))

        counter+=1
    return errors


def search():

    version = (3,0)
    curr_version = sys.version_info
    if curr_version >= version:     #If the Current Version of Python is 3.0 or above
        import urllib.request    #urllib library for Extracting web pages
    else:
        import urllib2 #If current version of python is 2.x

    search_keyword = input("Enter the search query: ")

    #Download Image Links
    links = []
    search_keyword = search_keyword.replace(" ","%20")
    url = 'https://www.google.com/search?q=' + search_keyword+ '&espv=2&biw=1366&bih=667&site=webhp&source=lnms&tbm=isch&sa=X&ei=XosDVaCXD8TasATItgE&ved=0CAcQ_AUoAg'
    raw_html =  (get_raw_html(url))
    links = links + (all_links(raw_html))
    print ("Total Image Links = "+str(len(links)))
    print ("\n")
    errors = download_images(links, search_keyword)
    print ("Download Complete.\n"+ str(errors) +" errors while downloading.")

search()

【讨论】：

您好，欢迎来到stackoverflow！通常为了使其更易于访问，您可以添加一些文本来描述和解释代码在做什么；）

【解决方案4】：

在这个python project 中，我在 unsplash.com 中进行搜索，它为我提供了一个 URL 列表，然后我将其中的一些（由用户预定义）保存到预定义的文件夹中。看看吧。

【讨论】：

【解决方案5】：

在 Windows 10/11 上，这使用起来相当简单

for /F "eol=;" %f in (filelist.txt) do curl -O %f

注意 eol=; 的包含允许我们通过在 filelist.txt 中我们这次不想要的那些行的开头添加 ; 来屏蔽单个排除项。如果在批处理文件 GetFileList.cmd 中使用上述内容，则将那些 %% 的值加倍

Windows 7 有一个 FTP 命令，但这通常会引发需要用户授权响应的防火墙对话框。

当前运行 Windows 7 并且想要下载 URL 列表而不下载任何 wget.exe 或其他依赖项，如 curl.exe（这将是最简单的第一个命令），最短的兼容方式是 power-shell 命令（不是我最喜欢速度，但如果需要的话。）

带有 URL 的文件是 filelist.txt 和 IWR 是 PS 几乎等同于 wget。

Security Protocol first 命令确保我们使用现代 TLS1.2 协议

-OutF ... split-path ... 表示文件名将与远程文件名相同，但在 CWD（当前工作目录）中，如果需要，您可以cd /d folder 进行脚本编写。

PS> [Net.ServicePointManager]::SecurityProtocol = "Tls12" ; GC filelist.txt | % {IWR $_ -OutF $(Split-Path $_ -Leaf)}

要作为 CMD 运行，请在 'Tls12' 周围使用一组稍微不同的引号

PowerShell -C "& {[Net.ServicePointManager]::SecurityProtocol = 'Tls12' ; GC filelist.txt | % {IWR $_ -OutF $(Split-Path $_ -Leaf)}}"

【讨论】：

【解决方案6】：

在 Windows 上，install wget - https://sourceforge.net/projects/gnuwin32/files/wget/1.11.4-1/

并将C:\Program Files (x86)\GnuWin32\bin 添加到您的环境路径中。

创建一个文件夹，其中包含您要下载的所有图像的 txt 文件。

在文件资源管理器顶部的位置栏中输入cmd

当命令提示符打开时输入以下内容。

wget -i images.txt --no-check-certificate

【讨论】：