Python - 从 .csv 读取数据并将其分配给预定义变量答案

【问题标题】：Python - Read and assign data from a .csv to pre-defined variablesPython - 从 .csv 读取数据并将其分配给预定义变量
【发布时间】：2016-05-26 12:19:48
【问题描述】：

我正在尝试从 CSV 文件 (A) 读取数据，提取数据，然后将其写入另一个 CSV 文件 (B)。在新文件 B 中，我想要两行。第一行应包含所有预定义变量，第 2 行应填充属于第 1 行中特定变量的所有值。

我希望任何人都可以告诉我实现这一目标的最佳方法。（我在本文末尾添加了我使用的 .csv 文件）

(A) Python 代码

import re
import csv

#Call for the export file
data = open('C:/Exports/Export 3.csv')

#Make a list with the predefined variables
definition = ["record_id", "abbreviation", "study_id", "step_count",
"distance", "ambulation_time", "velocity", "cadence", "norm_velocity",
"step_time_differential", "step_length_differential",
"cycle_time_differential", "step_time", "step_length", "step_extremity",
"cycle_time", "stride_length", "hh_base_support", "swing_time",
"stance_time", "single_support_time", "double_support_time", "toe_in_out"]

my_data = {}

#Show data for each row without whitespace
for line in data:
    line = line.rstrip()
    #print(line)
    values = re.findall("-?[0-9].+", line)
    print(values)

这是上述代码将生成的输出的一部分：

[]
['3;']
['292,34;']
['1,67;']
['175,1;']
['107,8;']
[]
['0,004;']
['1,051;']
['0,008;']
[]
[]
['0,558;0,554']
['96,746;97,797']
[]
['1,116;1,108']
['192,159;197,122']
['2,988;6,32']
['0,466;0,466']
['0,65;0,642']
['0,466;0,466']
['0,184;0,176']
['41,8;42,1']
['58,2;57,9']
['41,8;42,1']
['16,5;15,9']
['-1,1;4']

正如您在输出代码中看到的，有些行包含两个值，例如： ['2,988;6,32'] 这些需要变为 1 值，方法是在将它们写入之前计算这两个值的平均值一个 csv 文件。

(B) 期望的输出

record_id  abbreviation  study_id  step_count  distance 
1                                  3           292,34

如果你喜欢，你可以玩导出文件，你可以在这里下载它： CSV export file

【问题讨论】：

提供更多关于样本输入和样本输出的信息，这样你就不会得到虚构的答案
谢谢！我更改了一些文本以使其更易于理解，并在最后添加了我使用的输入 .csv 文件。还添加了所需输出的示例。

标签： python csv

【解决方案1】：

您应该使用csv 库和semi-colon 分隔打开文件，然后将第一列与定义中的项目进行比较。这几乎可以做到这一点：

import csv
from collections import defaultdict

data = defaultdict(str)

#Make a list with the predefined variables
definition = ["record_id", "abbreviation", "study_id", "step_count",
"distance", "ambulation_time", "velocity", "cadence", "norm_velocity",
"step_time_differential", "step_length_differential",
"cycle_time_differential", "step_time", "step_length", "step_extremity",
"cycle_time", "stride_length", "hh_base_support", "swing_time",
"stance_time", "single_support_time", "double_support_time", "toe_in_out"]

with open('C:/Exports/Export 3.csv', 'r') as f, 
     open('C:/Exports/result.csv', 'w') as outfile:
    reader = csv.reader(f, delimiter=';')
    next(reader, None)  # skip the headers

    writer = csv.DictWriter(outfile, fieldnames=definition, lineterminator='\n')
    writer.writeheader()

    for row in reader:
        for item in definition:
            h = item.replace('_','')
            r0 = row[0].lower().replace(' ','')
            if h in r0:
                print(h, r0)
                data[item] = row[1] 

    data['record_id'] = 1 # record id does not exist in input file: Export 3.csv

    writer.writerow(data)

要从项目中获取平均值，您可以使用：

try: 
   avg = (float(row[1].replace(',', '.')) + float(row[2].replace(',', '.')))/2 
except ValueError:
   avg = 0 # for cases with empty strings or commas

【讨论】：

非常感谢！这对我很有帮助！，我仍然有一些小问题，我在下面发布了答案。
嗨@Yak，您可以通过使definition 更紧密地匹配输入文件中的名称来解决不匹配问题。至于平均，请参阅我的更新
是的，这也是我的想法，Moses，但例如速度与速度完全匹配，但在 result.csv 中，速度值为空。这似乎发生了，因为它的名称中有更多带有速度的变量，例如：stridevelocitystddev。至于平均值，我应该把它放在代码中的哪个位置，所以它也会传递给result.csv？
你好@Moses Koledoye，你知道如何解决最后一块拼图吗？如果您愿意，将不胜感激！
你能看看我的这个帖子吗？我试图详细解释我面临的问题。 Read from & Write values to a .csv

【解决方案2】：

几乎完美！好像有一些小问题。在 result.csv 中，我缺少以下变量的值：

step_time
step_length
cycle_time  
stride_length   
hh_base_support 
swing_time  
stance_time 
single_supp_time    
double_supp_time    
toe_in_out

我使用这部分代码来检查结果：

print(h, r0, row[1], row[2])

这给了我以下信息：

stepcount stepcount 3  
distance distance 292,34  
ambulationtime ambulationtime 1,67  
velocity velocity 175,1  
cadence cadence 107,8  
velocity normalizedvelocity ,  
normalizedvelocity normalizedvelocity ,  
steptimedifferential steptimedifferential 0,004  
steptime steptimedifferential 0,004  
steplengthdifferential steplengthdifferential 1,051  
steplength steplengthdifferential 1,051  
cycletimedifferential cycletimedifferential 0,008  
cycletime cycletimedifferential 0,008  
steptime steptime(sec) 0,558 0,554
steplength steplength(cm) 96,746 97,797
stepextremity stepextremity(ratio) , ,
cycletime cycletime(sec) 1,116 1,108
stridelength stridelength(cm) 192,159 197,122
hhbasesupport hhbasesupport(cm) 2,988 6,32
swingtime swingtime(sec) 0,466 0,466
stancetime stancetime(sec) 0,65 0,642
velocity stridevelocity 172,185 177,908
steptime steptimestddev , 0,006
stridelength stridelengthstddev , ,
swingtime swingtimestddev , ,
stancetime stancetimestddev , ,
velocity stridevelocitystddev , ,
singlesupptime singlesupptimestddev , ,
doublesupptime doublesupptimestddev , ,

从上面的输出中，您可以看到名称与多个字符串匹配（如velocity）和一些根本不匹配（如toe_in_out）存在一些问题。我不知道如何解决这个问题。

我也尝试在有两个值时计算平均值，但这给了我错误：ValueError：无法将字符串转换为浮点数。我认为这是逗号的原因。我尝试在 for 循环中应用以下代码来计算平均值：

float(row[1]+float(row[2])) / 2

【讨论】：