超时后恢复 FTP 下载答案

【问题标题】：Resume FTP download after timeout超时后恢复 FTP 下载
【发布时间】：2011-10-19 10:10:36
【问题描述】：

我正在从一个不稳定的 FTP 服务器下载文件，该服务器在文件传输过程中经常超时，我想知道是否有办法重新连接并恢复下载。我正在使用 Python 的 ftplib。这是我正在使用的代码：

#! /usr/bin/python

import ftplib
import os
import socket
import sys

#--------------------------------#
# Define parameters for ftp site #
#--------------------------------#
site           = 'a.really.unstable.server'
user           = 'anonymous'
password       = 'someperson@somewhere.edu'
root_ftp_dir   = '/directory1/'
root_local_dir = '/directory2/'

#---------------------------------------------------------------
# Tuple of order numbers to download. Each web request generates 
# an order numbers
#---------------------------------------------------------------
order_num = ('1','2','3','4')

#----------------------------------------------------------------#
# Loop through each order. Connect to server on each loop. There #
# might be a time out for the connection therefore reconnect for #
# every new ordernumber                                          #
#----------------------------------------------------------------#
# First change local directory
os.chdir(root_local_dir)

# Begin loop through 
for order in order_num:
    
    print 'Begin Proccessing order number %s' %order
    
    # Connect to FTP site
    try:
        ftp = ftplib.FTP( host=site, timeout=1200 )
    except (socket.error, socket.gaierror), e:
        print 'ERROR: Unable to reach "%s"' %site
        sys.exit()
    
    # Login
    try:
        ftp.login(user,password)
    except ftplib.error_perm:
        print 'ERROR: Unable to login'
        ftp.quit()
        sys.exit()
     
    # Change remote directory to location of order
    try:
        ftp.cwd(root_ftp_dir+order)
    except ftplib.error_perm:
        print 'Unable to CD to "%s"' %(root_ftp_dir+order)
        sys.exit()

    # Get a list of files
    try:
        filelist = ftp.nlst()
    except ftplib.error_perm:
        print 'Unable to get file list from "%s"' %order
        sys.exit()
    
    #---------------------------------#
    # Loop through files and download #
    #---------------------------------#
    for each_file in filelist:
        
        file_local = open(each_file,'wb')
        
        try:
            ftp.retrbinary('RETR %s' %each_file, file_local.write)
            file_local.close()
        except ftplib.error_perm:
            print 'ERROR: cannot read file "%s"' %each_file
            os.unlink(each_file)
        
    ftp.quit()
    
    print 'Finished Proccessing order number %s' %order
    
sys.exit()

我得到的错误：

socket.error: [Errno 110] 连接超时

非常感谢任何帮助。

【问题讨论】：

一定要签出ftputil.sschwarzer.net/trac，它会让任何与ftp相关的任务变得更容易。

标签： python ftp timeout ftplib resume

【解决方案1】：

使用 Python ftplib 实现可恢复 FTP 下载的简单示例：

def connect():

ftp = None

with open('4gb', 'wb') as f:
    while (not finished):
        if ftp is None:
            print("Connecting...")
            FTP(host, user, passwd)

        try:
            rest = f.tell()
            if rest == 0:
                rest = None
                print("Starting new transfer...")
            else:
                print(f"Resuming transfer from {rest}...")
            ftp.retrbinary('RETR 4gb', f.write, rest=rest)
            print("Done")
            finished = True
        except Exception as e:
            ftp = None
            sec = 5
            print(f"Transfer failed: {e}, will retry in {sec} seconds...")
            time.sleep(sec)

建议进行更细粒度的异常处理。

同样适用于上传：
Handling disconnects in Python ftplib FTP transfers file upload

【讨论】：

【解决方案2】：

仅使用标准工具（请参阅RFC959）通过 FTP 恢复下载需要使用块传输模式（第 3.4.2 节），可以使用 MODE B 命令设置。尽管此功能在技术上是符合规范所必需的，但我不确定所有 FTP 服务器软件都实现了它。

在块传输模式下，与流传输模式相反，服务器以块的形式发送文件，每个块都有一个标记。此标记可能会重新提交给服务器以重新启动失败的传输（第 3.5 节）。

规范说：

[...] 提供重新启动程序以保护用户免受严重系统故障（包括主机、FTP 进程或底层网络的故障）的影响。

但是，AFAIK，规范没有定义标记所需的生命周期。它只说以下内容：

标记信息仅对发送者有意义，但必须由控制连接的默认或协商语言（ASCII 或 EBCDIC）中的可打印字符组成。标记可以表示位计数、记录计数或系统可以用来识别数据检查点的任何其他信息。数据的接收者如果执行了重启程序，则在接收系统中标记该标记的对应位置，并将该信息返回给用户。

可以安全地假设实现此功能的服务器将提供在 FTP 会话之间有效的标记，但您的里程可能会有所不同。

【讨论】：

【解决方案3】：

为此，您必须保留中断的下载，然后找出您缺少文件的哪些部分，下载这些部分，然后将它们连接在一起。我不确定如何执行此操作，但有一个用于 Firefox 和 Chrome 的下载管理器，名为 DownThemAll 可以执行此操作。虽然代码不是用 python 编写的（我认为是 JavaScript），但您可以查看代码并了解它是如何做到的。

DownThemll - http://www.downthemall.net/

【讨论】：

DownThemAll 是用 JavaScript 和 XUL（XML 用户界面语言）编写的。来源-en.wikipedia.org/wiki/DownThemAll！和github.com/nmaier/DownThemAll