【问题标题】:csv writer - How to write rows into multiple files, based on a threshold?csv writer - 如何根据阈值将行写入多个文件?
【发布时间】:2021-12-28 21:26:35
【问题描述】:

我想将行写入 csv 文件,但文件应包含不超过 X 行。如果超过阈值,则需要启动一个新文件。 所以如果我有以下数据:

csv_max_rows=3
columns = ["A", "B", "C"]
rows = [
    ["a1", "b1", "c1"],
    ["a2", "b2", "c2"],
    ["a3", "b3", "c3"],
    ["a4", "b4", "c4"],
    ["a5", "b5", "c5"],
    ["a6", "b6", "c6"],
    ["a7", "b7", "c7"],
    ["a8", "b8", "c8"],
    ["a9", "b9", "c9"],
    ["a10", "b10", "c10"]
]

我想最终得到 4 个文件,其中文件 1、2、3 各有 3 行,文件 4 只有一行。 在 Python csv writer 中是否有内置选项可以做到这一点?

【问题讨论】:

    标签: python python-3.x csv csvwriter


    【解决方案1】:

    我认为您的要求过于具体,无法在标准库中期望内置选项。下面的解决方案有点老套,但我认为这正是你想要的。

    import csv
    
    csv_max_rows = 3
    columns = ["A", "B", "C"]
    rows = [
        ["a1", "b1", "c1"],
        ["a2", "b2", "c2"],
        ["a3", "b3", "c3"],
        ["a4", "b4", "c4"],
        ["a5", "b5", "c5"],
        ["a6", "b6", "c6"],
        ["a7", "b7", "c7"],
        ["a8", "b8", "c8"],
        ["a9", "b9", "c9"],
        ["a10", "b10", "c10"],
    ]
    
    for i, row in enumerate(rows):
        if (i % csv_max_rows) == 0:
            fp = open(f"out_{i//csv_max_rows+1}.csv", "w")
            writer = csv.writer(fp)
            writer.writerow(columns)
        writer.writerow(row)
    
    

    【讨论】:

    • 也许if i % csv_max_rows == 0: 更清楚?否则,很好的答案:)
    • 谢谢@Zach Young,我同意你的建议更清楚。看来not (i % csv_max_rows): 是我发现的一个怪癖。我会编辑答案。
    【解决方案2】:

    我不确定是否有内置选项,但显然实现起来并不复杂:

    from typing import List
    import csv
    import concurrent
    
    
    def chunks(lst: List, n: int):
        while lst:
            chunk = lst[0:n]
            lst = lst[n:]
            yield chunk
    
    
    def write_csv(csv_file_path: str, columns: List[str], rows: List[List]):
        with open(csv_file_path, 'w') as csv_file:
            csv_writer = csv.writer(csv_file)
            csv_writer.writerow(columns)
            for row in rows:
                csv_writer.writerow(row)
    
    def write_csv_parallel(base_csv_file_path: str, columns: List[str], rows: List[List], csv_max_rows: int) -> List[str]:
        with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
            chunked_rows = chunks(rows, csv_max_rows)
            csv_writing_args = ((f"{base_csv_file_path}.{idx + 1}", columns, chunk_of_rows) for idx, chunk_of_rows
                                in enumerate(chunked_rows))
            executor.map(lambda f: write_csv(*f), csv_writing_args)
    
    
    if __name__ == "__main__":
        columns = ["A", "B", "C"]
        rows = [
            ["a1", "b1", "c1"],
            ["a2", "b2", "c2"],
            ["a3", "b3", "c3"],
            ["a4", "b4", "c4"],
            ["a5", "b5", "c5"],
            ["a6", "b6", "c6"],
            ["a7", "b7", "c7"],
            ["a8", "b8", "c8"],
            ["a9", "b9", "c9"],
            ["a10", "b10", "c10"]
        ]
        base_csv_file_path = "/tmp/test_file.csv"
        csv_file_paths = write_csv_parallel(base_csv_file_path, columns, rows, csv_max_rows=3)
        print("data was written into the following files: \n" + "\n".join(csv_file_paths))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-09-03
      • 1970-01-01
      相关资源
      最近更新 更多