使用 python 3 从 SFTP 服务器读取大文件答案

【问题标题】：read big files from SFTP server with python 3使用 python 3 从 SFTP 服务器读取大文件
【发布时间】：2021-05-31 16:36:58
【问题描述】：

我想用python读取存在于centos服务器上的多个大文件。我为此编写了一个简单的代码并且它工作但整个文件来到一个paramiko对象（paramiko.sftp_file.SFTPFile）之后我可以处理行。它的性能不好，我想要处理文件并逐个写入csv，因为处理整个文件会影响性能。有没有办法解决这个问题？

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host, port, username, password)

sftp_client = ssh.open_sftp()
remote_file = sftp_client.open(r'/root/bigfile.csv')

try:
    for line in remote_file:
        #Proccess
finally:
    remote_file.close()

【问题讨论】：

检查这个：*.com/questions/17444679/reading-a-huge-csv-file

标签： python

【解决方案1】：

这里可以解决你的问题。

 def lazy_loading_ftp_file(sftp_host_conn, filename):
    """
        Lazy loading ftp file when exception simple sftp.get call
        :param sftp_host_conn: sftp host
        :param filename: filename to be downloaded
        :return: None, file will be downloaded current directory
    """
    import shutil
    try:
        with sftp_host_conn() as host:
            sftp_file_instance = host.open(filename, 'r')
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(sftp_file_instance.raw, out_file)
            return {"status": "sucess", "msg": "sucessfully downloaded file: {}".format(filename)}
    except Exception as ex:
        return {"status": "failed", "msg": "Exception in Lazy reading too: {}".format(ex)}

这将避免一次将整个内容读入内存。

【讨论】：

【解决方案2】：

分块阅读将帮助您：

import pandas as pd
chunksize = 1000000
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

更新：

是的，我知道我的答案是基于本地文件编写的。只是举个分块读取文件的例子。

要回答这个问题，请查看以下问题：

paramiko.sftp_client.SFTPClient.putfo
Functions for working with remote files using pandas and paramiko (SFTP/SSH). - 传递我上面提到的块大小。

【讨论】：

文件不在本地服务器上，它们在 sftp 服务器上，整个文件来到 sftp 对象
您是否意识到该文件不存在于本地文件系统上，并且sftp 不是read_csv 的有效 URL 方案（协议）？换句话说，这并不能回答当前的问题...