删除 Big CSV 文件的第一行？答案

【问题标题】：Removing first line of Big CSV file?删除 Big CSV 文件的第一行？
【发布时间】：2015-03-11 03:36:24
【问题描述】：

我应该如何在 python 中删除大 CSV 文件的第一行？我在这里查看了以前的解决方案，其中一个是：

with open("test.csv",'r') as f:
    with open("updated_test.csv",'w') as f1:
        f.next() # skip header line
        for line in f:
            f1.write(line)

这给了我这个错误：

f.next() # skip header line
AttributeError: '_io.TextIOWrapper' object has no attribute 'next'

另一个解决方案是：

with open('file.txt', 'r') as fin:
    data = fin.read().splitlines(True)
with open('file.txt', 'w') as fout:
    fout.writelines(data[1:])

这会带来内存问题！

【问题讨论】：

你为什么要这样做？
因为我想去掉标题行，以后这些数据会被合并，编辑起来会比较麻烦。
既然可以推迟到合并文件时跳过标题行，为什么还要浪费时间？

标签： python csv python-3.x bigdata

【解决方案1】：

使用sed 可能是最快的并且不需要临时文件，因此python 包装器将是：

import subprocess

def delete_first_lines(filename, line_nums):
    n = '1,{}d'.format(line_nums)
    subprocess.Popen(['sed', '-i', n, filename ],
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT
        )

【讨论】：

这个答案完全不清楚。什么是“sed”？这似乎是您正在产生的某种过程，但它有什么作用？给定的代码不起作用。
在任何类 linux 系统上提供了一个标准工具。

【解决方案2】：

使用 f.__next__() 代替 f.next()

这里的文档： https://docs.python.org/3/library/stdtypes.html#iterator.next

【讨论】：

不，不要这样做。这个方法是这样使用的：for x in myIter: ....
被否决，因为在这里直接调用双下划线方法是非pythonic（它们丑陋是有原因的）。相反，更喜欢next(f)。
感谢您指出。我也是来学习的，这个方法对我有用，我没有评估 unpythonic 的方式
@LutzHorn: next(f) 更可取。直接迭代在跳过第一行的既定目标中失败了。
这不是我的意思。我声明__next__() 在我的评论中的代码在幕后使用。我没有将我的评论作为 answer 发布，而是作为错误答案的 comment 发布。

【解决方案3】：

将f.next() 替换为next(f)

with open("test.csv",'r') as f, open("updated_test.csv",'w') as f1:
    next(f) # skip header line
    for line in f:
        f1.write(line)

【讨论】：

这对于非常大的 CSV 文件（具有数十万行）是否是最佳选择？