我应该从“urllib.request.urlretrieve(..)”切换到“urllib.request.urlopen(..)”吗？答案

【问题标题】：Should I switch from "urllib.request.urlretrieve(..)" to "urllib.request.urlopen(..)"?我应该从“urllib.request.urlretrieve(..)”切换到“urllib.request.urlopen(..)”吗？
【发布时间】：2019-07-06 14:32:51
【问题描述】：

1。弃用问题

在 Python 3.7 中，我使用 urllib.request.urlretrieve(..) 函数从 URL 下载一个大文件。在文档 (https://docs.python.org/3/library/urllib.request.html) 中，我在 urllib.request.urlretrieve(..) 文档上方阅读了以下内容：

旧版界面
以下函数和类是从 Python 2 模块 urllib（相对于 urllib2）移植而来的。它们可能会在将来的某个时候被弃用。

2。寻找替代方案

为了让我的代码经得起未来考验，我正在寻找替代方案。官方 Python 文档没有提及具体的文档，但看起来 urllib.request.urlopen(..) 是最直接的候选者。它位于文档页面的顶部。

不幸的是，替代方案 - 例如 urlopen(..) - 不提供 reporthook 参数。 此参数是您传递给 urlretrieve(..) 函数的可调用对象。反过来，urlretrieve(..) 会定期使用以下参数调用它：

块编号
块大小
文件总大小

我用它来更新进度条。这就是为什么我错过了替代方案中的reporthook 参数。

3。 urlretrieve(..) 与 urlopen(..)

我发现urlretrieve(..) 只是使用urlopen(..)。查看 Python 3.7 安装中的request.py 代码文件（Python37/Lib/urllib/request.py）：

_url_tempfiles = []
def urlretrieve(url, filename=None, reporthook=None, data=None):
    """
    Retrieve a URL into a temporary location on disk.

    Requires a URL argument. If a filename is passed, it is used as
    the temporary file location. The reporthook argument should be
    a callable that accepts a block number, a read size, and the
    total file size of the URL target. The data argument should be
    valid URL encoded data.

    If a filename is passed and the URL points to a local resource,
    the result is a copy from local file to new file.

    Returns a tuple containing the path to the newly created
    data file as well as the resulting HTTPMessage object.
    """
    url_type, path = splittype(url)

    with contextlib.closing(urlopen(url, data)) as fp:
        headers = fp.info()

        # Just return the local path and the "headers" for file://
        # URLs. No sense in performing a copy unless requested.
        if url_type == "file" and not filename:
            return os.path.normpath(path), headers

        # Handle temporary file setup.
        if filename:
            tfp = open(filename, 'wb')
        else:
            tfp = tempfile.NamedTemporaryFile(delete=False)
            filename = tfp.name
            _url_tempfiles.append(filename)

        with tfp:
            result = filename, headers
            bs = 1024*8
            size = -1
            read = 0
            blocknum = 0
            if "content-length" in headers:
                size = int(headers["Content-Length"])

            if reporthook:
                reporthook(blocknum, bs, size)

            while True:
                block = fp.read(bs)
                if not block:
                    break
                read += len(block)
                tfp.write(block)
                blocknum += 1
                if reporthook:
                    reporthook(blocknum, bs, size)

    if size >= 0 and read < size:
        raise ContentTooShortError(
            "retrieval incomplete: got only %i out of %i bytes"
            % (read, size), result)

    return result

4。结论

从这一切中，我看到了三个可能的决定：

我的代码不变。希望urlretrieve(..) 函数不会很快被弃用。
我给自己写了一个替换函数，在外部表现得像urlretrieve(..)，在内部使用urlopen(..)。实际上，这样的功能将是上面代码的复制粘贴。这样做感觉不干净 - 与使用官方 urlretrieve(..) 相比。
我给自己写了一个替换函数，在外部表现得像urlretrieve(..)，而在内部使用完全不同的东西。但是，嘿，我为什么要这样做？ urlopen(..) 没有被弃用，为什么不使用它呢？

你会做出什么决定？

【问题讨论】：

如果您害怕，请将urlretrieve() 复制到您的代码中并立即使用它或仅在没有原始urllib.request.urlretrieve() 时使用它
查看类似问题的答案：alternative-of-urllib-urlretrieve-in-python-3-5 和 how-to-download-a-file-over-http

标签： python python-3.x http urlopen urlretrieve

【解决方案1】：

以下示例使用urllib.request.urlopen 从粮农组织统计数据库下载包含大洋洲作物生产数据的 zip 文件。在该示例中，必须定义一个最小标题，否则 FAOSTAT 会抛出一个Error 403: Forbidden。

import shutil
import urllib.request
import tempfile

# Create a request object with URL and headers    
url = “http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_Livestock_E_Oceania.zip”
header = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '}
req = urllib.request.Request(url=url, headers=header)

# Define the destination file
dest_file = tempfile.gettempdir() + '/' + 'crop.zip'
print(f“File located at:{dest_file}”)

# Create an http response object
with urllib.request.urlopen(req) as response:
    # Create a file object
    with open(dest_file, "wb") as f:
        # Copy the binary content of the response to the file
        shutil.copyfileobj(response, f)

基于https://stackoverflow.com/a/48691447/2641825 的请求部分和https://stackoverflow.com/a/66591873/2641825 的标头部分，另请参阅https://docs.python.org/3/howto/urllib2.html 上的urllib 文档

【讨论】：

谢谢！这很棒:-)