【问题标题】:Can't save dataframe to csv in Python无法在 Python 中将数据框保存到 csv
【发布时间】:2019-03-06 18:52:52
【问题描述】:

我正在尝试保存一个数据框,我已经操纵该数据框来计算重复行的平均值和中位数以及总数。但是,该脚本似乎运行没有问题,但没有实际输出我请求的文件。任何人都可以就正在发生的事情给我任何建议吗?

这是我正在使用的代码:

"""Separate and combine frequencies of like relations, 
then produce extra columns with mean and median of these to
get a better overall picture of each relation"""

import numpy as np
import pandas as pd
from numpy.random.mtrand import pareto

def sort_table(fname):
    #read in file
    parent_child_rel = pd.read_csv(fname)
    print(parent_child_rel)

    #drop first column
    parent_child_rel = parent_child_rel.iloc[:,1:]
    print(parent_child_rel)


    #put all upper case
    parent_child_rel = parent_child_rel.apply(lambda x:x.astype(str).str.upper())

    print(parent_child_rel.dtypes) 

    #change datatype to float for nnmbers
    parent_child_rel['Hits'] = parent_child_rel['Hits'].astype('float') 
    parent_child_rel['Score'] = parent_child_rel['Score'].astype('float')

    #group and provide totals and means for hits and score
    aggregated = parent_child_rel.groupby(['parent', 'child'], as_index=False).aggregate({'Hits': np.sum, 'Score': [np.mean, np.median]})


    print(aggregated.dtypes)

    print(aggregated)

    with open('./Sketch_grammar/aggregated_relations_SkG_1.csv', 'a') as outfile:
        aggregated.to_csv(outfile)


def main():
    sort_table('./Sketch_grammar/parent_child_SkG_relations.csv')


if __name__ == '__main__':
    main ()

【问题讨论】:

  • 在 pandas 中使用 to_csv 时不需要使用文件处理程序 open。它为您处理。尝试改用aggregated.to_csv('./Sketch_grammar/aggregated_relations_skG_1.csv', mode='a')。我不清楚为什么这会与你正在做的不同,但那是代码的一部分,我觉得很奇怪,所以值得一试。

标签: python csv dataframe pandas-groupby


【解决方案1】:

您无需打开文件即可将其保存为 CSV。只需指定to_csv 函数的路径即可。

另外,fname 参数中有文件名,因此无需手动重新编写。

您的代码将是:

"""Separate and combine frequencies of like relations, 
then produce extra columns with mean and median of these to
get a better overall picture of each relation"""

import numpy as np
import pandas as pd
from numpy.random.mtrand import pareto

def sort_table(fname):
    #read in file
    parent_child_rel = pd.read_csv(fname)
    print(parent_child_rel)

    #drop first column
    parent_child_rel = parent_child_rel.iloc[:,1:]
    print(parent_child_rel)


    #put all upper case
    parent_child_rel = parent_child_rel.apply(lambda x:x.astype(str).str.upper())

    print(parent_child_rel.dtypes) 

    #change datatype to float for nnmbers
    parent_child_rel['Hits'] = parent_child_rel['Hits'].astype('float') 
    parent_child_rel['Score'] = parent_child_rel['Score'].astype('float')

    #group and provide totals and means for hits and score
    aggregated = parent_child_rel.groupby(['parent', 'child'], as_index=False).aggregate({'Hits': np.sum, 'Score': [np.mean, np.median]})


    print(aggregated.dtypes)

    print(aggregated)

    aggregated.to_csv(fname)


def main():
    sort_table('./Sketch_grammar/parent_child_SkG_relations.csv')


if __name__ == '__main__':
    main ()

如果您不想添加带有索引的额外列(您可能不想),那么您应该指定它:

aggregated.to_csv(fname, index = False)

正如@brittenb 所建议的,您想将数据附加到文件中,因此您应该使用mode = "a"

aggregated.to_csv(fname, mode = "a")

【讨论】:

  • 不要忘记mode='a',因为 OP 希望根据 open(..., 'a') as outfile 中的原始代码附加数据。
  • 谢谢大家 - 成功了!
猜你喜欢
  • 2019-05-21
  • 2022-01-05
  • 1970-01-01
  • 2020-12-09
  • 2018-09-11
  • 1970-01-01
  • 1970-01-01
  • 2016-03-12
  • 1970-01-01
相关资源
最近更新 更多