如何通过 python pandas&csv 跳过大量 CSV 文件的某些行？答案

【问题标题】：How to skip certain rows of numerous CSV files by python pandas&csv?如何通过 python pandas&csv 跳过大量 CSV 文件的某些行？
【发布时间】：2020-08-13 07:40:12
【问题描述】：

我已将多个 CSV 文件放在一个折叠中，并想先跳过某行（例如第 10 行），然后每五行取一行。
我可以做第一步，但不知道第二步。

谢谢。

import pandas as pd
import csv, os


# Loop through every file in the current working directory.
for csvFilename in os.listdir('path'):
    if not csvFilename.endswith('.csv'):
        continue
    # Now let's read the dataframe
    # total row number
    total_line = len(open('path' + csvFilename).readlines())
    # put the first and last to a list
    line_list = [total_line] + [1]
    df = pd.read_csv('path' + csvFilename, skiprows=line_list)
    new_file_name = csvFilename

    # And output
    df.to_csv('path' + new_file_name, index=False)

正确的代码如下所示。

import numpy as np
import pandas as pd
import csv, os

# Loop through every file in the current working directory.
for csvFilename in os.listdir('path'):
    if not csvFilename.endswith('.csv'):
        continue
    # Now let's read the dataframe
    total_line = len(open('path' + csvFilename).readlines())
    skip = np.arange(total_line)
    # skip 5 rows
    skip = np.delete(skip, np.arange(0, total_line, 5))
    # skip the certain row you would like, e.g. 10
    skip = np.append(skip, 10)
    df = pd.read_csv('path' + csvFilename, skiprows=skip)

    new_file_name = '2' + csvFilename
    # And output
    df.to_csv('path' + new_file_name, index=False)

【问题讨论】：

这能回答你的问题吗？ Select every nth row as a Pandas DataFrame without reading the entire file
你可以edit这个问题，如果你想添加一些东西，或者如果你有答案，你可以添加（回答你自己的问题很好）。如果我链接的问题回答了您的问题，您可以接受副本。 :)
感谢您的帮助。我已经更新了我的代码，但是仍然存在一些问题。
没问题。 skip 包含您要跳过的行，因此您需要删除行 np.delete(skip, total_line-1, 0) 和 np.delete(skip, 1, 0)。对于最后一个，您可能应该从 1 开始：np.delete(skip, np.arange(1, total_line, 5))。对于最后一行，您需要确保它在skip 列表中，或者您可以使用read_csv 中的skipfooter 参数。
感谢您的帮助。我已经解决了这个问题。

标签： python pandas csv

【解决方案1】：

您可以使用带有skiprows 的函数。

我在下面编辑了您的代码：

    import numpy as np  
    import csv, os  

    # Loop through every file in the current working directory.
    for csvFilename in os.listdir('path'):
        if not csvFilename.endswith('.csv'):
            continue
        # Now let's read the dataframe
        total_line = len(open('path' + csvFilename).readlines())

        df = pd.read_csv('path' + csvFilename, skiprows=lambda x: x in list(range(total_line))[1:-1:5])

        new_file_name = csvFilename
        # And output
        df.to_csv('path' + new_file_name, index=False)

【讨论】：

出了点问题。如果我这样做，它会跳过我真正想要的。
您可以将代码的“[1:-1:5]”部分更改为“[1:-1:6]”或将其更改为“[1:-1:4 ]"，你会得到你想要的。