将 s3 的内容写入 CSV答案

【问题标题】：Writing contents of s3 to CSV将 s3 的内容写入 CSV
【发布时间】：2017-07-01 18:00:16
【问题描述】：

我正在创建一个脚本，将我的 s3 数据抓取到我的本地计算机。通常，我收到的数据是配置单元分区的数据。即使文件确实存在，我也会收到 No such file or directory 错误。有人可以解释我做错了什么以及我应该如何以不同的方式处理这个问题？这是错误引用的一段代码：

bucket = conn.get_bucket(bucket_name)
for sub in bucket.list(prefix = 'some_prefix'):
        matched = re.search(re.compile(read_key_pattern), sub.name)
        if matched:
            with open(sub.name, 'rb') as fin:
                reader = csv.reader(fin, delimiter = '\x01')
                contents = [line for line in reader]
            with open('output.csv', 'wb') as fout:
                writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\\')
                writer.writerows.content

IOError: [Errno 2] 没有这样的文件或目录：'my_prefix/54c91e35-4dd0-4da6-a7b7-283dff0f4483-000000'

该文件存在，这是我要检索的正确文件夹和文件。

【问题讨论】：

该文件名上似乎没有扩展名，例如.txt?
确定您的当前目录？
是的，文件错误指向了正确的文件，至于扩展名，它没有，至少我看不到一个。我在本地下载了文件，代码就是这样处理的
出现此错误的情况有限。 “我在本地下载了文件，并且代码以这种方式运行”所以如果你将它与脚本放在同一目录中，它可以工作吗？
没错。我试图理解为什么当错误消息指向正确的文件路径@roganjosh 时，这个错误告诉我文件不存在

标签： python amazon-s3 boto

【解决方案1】：

就像@roganjosh 所说，在您测试名称匹配后，您似乎还没有downloaded the file。我在下面添加了 cmets 来向您展示如何在 python 2 中处理内存中的文件：

    from io import StringIO # alternatively use BytesIO
    import contextlib

    bucket = conn.get_bucket(bucket_name)
    # use re.compile outside of the for loop
    # it has slightly better performance characteristics
    matcher = re.compile(read_key_pattern)

    for sub in bucket.list(prefix = 'some_prefix'):
        # bucket.list returns an iterator over s3.Key objects
        # so we can use `sub` directly as the Key object
        matched = matcher.search(sub.name)
        if matched:
            # download the file to an in-memory buffer
            with contextlib.closing(StringIO()) as fp:
                sub.get_contents_to_file(fp)
                fp.seek(0)
                # read straight from the memory buffer
                reader = csv.reader(fp, delimiter = '\x01')
                contents = [line for line in reader]
            with open('output.csv', 'wb') as fout:
                writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\\')
                writer.writerows.content

对于 python 3，您需要将 cmets 中讨论的 with 语句更改为答案 for this question。

【讨论】：

我认为 OP 提供了不正确的相对路径。我们不知道my_prefix 是什么，但我想它作为相对路径或绝对路径都无效。
看起来（从代码中）my_prefix 是 S3 中的目录（即 S3 密钥名称是 my_prefix/SOME_GUID）。所以他只是忘了从 S3 下载文件（毫无疑问，这是一个重要的步骤）。
这是正确的。我假设我不需要先下载，我可以使用csv 模块一个一个地打开它们。谢谢！
只想添加一条评论。由于编码，我不得不将我的更改为BytesIO。因此，如果将来有人遇到TypeError，只需将StringIO 更改为BytesIO