Python读取csv文件并过滤数据答案

【问题标题】：Python read csv file and filter dataPython读取csv文件并过滤数据
【发布时间】：2020-09-04 10:37:41
【问题描述】：

如果之前已经回答过这个问题，我深表歉意，但是我检查了一堆帖子，只是无法理解我的代码有什么问题。我正在尝试在 python 中读取一个 csv 文件（见下文）并通过第二列（角度）中的值过滤掉数据行。然后我想用过滤的时间和角度值创建一个新的输出文件。我只得到写有标题的输出文件。

csv 文件：

time,angle
0,56
1,89
2,112
3,189
4,122
5,123

代码：

import csv

#define the min and max value of angle
alpha_min = 110
alpha_max = 125

#read csv file and loop through with a filter
with open('test_csv.csv', 'r') as input_file:
    csv_reader = csv.reader(input_file)#, delimiter=',')
    #header = next(input_file).strip("\n").split(",")
    results = filter(lambda row: alpha_min<row[1]<alpha_max, csv_reader)

#create output file
with open('test_output_csv.csv', "w") as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerow(header)
    for result in results:
        csv_writer.writerow(result)

【问题讨论】：

python 不会像你想象的那样评估alpha_min<row[1]<alpha_max。你应该把它拆开alpha_min<row[1] and row[1]<alpha_max
@AskoldIlvento 实际上，Python 确实是这样工作的，但 row[1] 是一个字符串，所以需要 int(row[1])。

标签： python csv filter

【解决方案1】：

csv 行的字段是字符串，因此您需要int(row[1]) 才能正常工作。我还建议对过滤使用列表推导，或使用pandas 来提高速度。 next(csv_reader) 也会读取一行来捕获标题。

注意：使用newline='' 和csv 模块作为documented 以避免blank lines between each row。

import csv

alpha_min = 110
alpha_max = 125

with open('test.csv','r',newline='') as input_file:
    csv_reader = csv.reader(input_file)
    header = next(csv_reader)
    results = [row for row in csv_reader if alpha_min < int(row[1]) < alpha_max]

with open('output.csv','w',newline='') as output_file:
    csv_writer = csv.writer(output_file)
    csv_writer.writerow(header)
    csv_writer.writerows(results)

【讨论】：

【解决方案2】：

你可以的

import csv

#define the min and max value of angle
alpha_min = 110
alpha_max = 125

#read csv file and loop through with a filter
with open('test_csv.csv', 'r') as input_file:
    csv_reader = csv.reader(input_file)#, delimiter=',')
    lines = [i for i in csv_reader]
    header = lines[0]
    results = filter(lambda row: alpha_min<int(row[1])<alpha_max, lines[1:])

#create output file
with open('test_output_csv.csv', "w", newline='') as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerow(header) 
    csv_writer.writerows(results)

这将保存到文件中

time,angle
2,112
4,122
5,123

【讨论】：

【解决方案3】：

我建议在此工作流程中使用 pandas library，这将比循环遍历 csv 文件的每一行更快、更有效。类似于以下内容：

import pandas as pd

#define the min and max value of angle
alpha_min = 110
alpha_max = 125

# read input and filter angle data
df = pd.read_csv('test_csv.csv')
df = df[(df['angle'] < alpha_max) & (df['angle'] > alpha_min)]

# write output
df.to_csv('output.csv')

【讨论】：

感谢您的意见。我怀疑输入的 .csv 文件可能有多达 100k 行，你认为这个冷是个问题吗？
不，至少在超过 10M 行之前，大小在这里不会成为问题
还有一个问题：是否可以与答案中的代码相同，但在第一（时间）列中更改某些内容，例如将时间格式从 16:55:45 更改为 165545在输出文件中？我正在研究 Pandas，但它的功能非常强大，我迷路了。
是的，当然。取决于现有时间值的数据类型。如果它是一个字符串，你可以删除带有类似df['time'] = df['time'].str.replace(':', '') 的冒号如果它是一个时间戳，你可以先将它转换为一个字符串，使用答案here