【问题标题】:What is the best way to save numpy arrays of different length to the same csv file?将不同长度的numpy数组保存到同一个csv文件的最佳方法是什么?
【发布时间】:2014-11-09 17:09:13
【问题描述】:

我正在使用一维 numpy 数组,首先进行一些数学运算,然后将所有内容保存到单个 csv 文件中。数据集通常具有不同的长度,我无法将它们拼合在一起。这是我能想到的最好的方法,但必须有更优雅的方法。

import numpy as np
import pandas as pd
import os


array1 = np.linspace(1,20,10)
array2 = np.linspace(12,230,10)
array3 = np.linspace(7,82,20)
array4 = np.linspace(6,55,20)

output1 = np.column_stack((array1.flatten(),array2.flatten())) #saving first array set to file 
np.savetxt("tempfile1.csv", output1, delimiter=',')
output2 = np.column_stack((array3.flatten(),array4.flatten())) # doing it again second array
np.savetxt("tempfile2.csv", output2, delimiter=',')
a = pd.read_csv('tempfile1.csv')                               # use pandas to read both files
b = pd.read_csv("tempfile2.csv")
merged = b.join(a, rsuffix='*')                                # merge with panda for single file
os.remove('tempfile1.csv')
os.remove("tempfile2.csv")                                     # delete temp files
merged.to_csv('savefile.csv', index=False)                     # save merged file

【问题讨论】:

    标签: python arrays csv numpy pandas


    【解决方案1】:

    您可以只使用concat 并传递参数axis=1,将数组附加为列:

    In [49]:
    
    array1 = np.linspace(1,20,10)
    array2 = np.linspace(12,230,10)
    array3 = np.linspace(7,82,20)
    array4 = np.linspace(6,55,20)
    
    pd.concat([pd.DataFrame(array1), pd.DataFrame(array2), pd.DataFrame(array3), pd.DataFrame(array4)], axis=1)
    Out[49]:
                0           0          0          0
    0    1.000000   12.000000   7.000000   6.000000
    1    3.111111   36.222222  10.947368   8.578947
    2    5.222222   60.444444  14.894737  11.157895
    3    7.333333   84.666667  18.842105  13.736842
    4    9.444444  108.888889  22.789474  16.315789
    5   11.555556  133.111111  26.736842  18.894737
    6   13.666667  157.333333  30.684211  21.473684
    7   15.777778  181.555556  34.631579  24.052632
    8   17.888889  205.777778  38.578947  26.631579
    9   20.000000  230.000000  42.526316  29.210526
    10        NaN         NaN  46.473684  31.789474
    11        NaN         NaN  50.421053  34.368421
    12        NaN         NaN  54.368421  36.947368
    13        NaN         NaN  58.315789  39.526316
    14        NaN         NaN  62.263158  42.105263
    15        NaN         NaN  66.210526  44.684211
    16        NaN         NaN  70.157895  47.263158
    17        NaN         NaN  74.105263  49.842105
    18        NaN         NaN  78.052632  52.421053
    19        NaN         NaN  82.000000  55.000000
    

    然后你可以像平常一样把它写到 csv 中

    pd.concat([pd.DataFrame(array1), pd.DataFrame(array2), pd.DataFrame(array3), pd.DataFrame(array4)], axis=1).to_csv('savefile.csv', index=False) 
    

    【讨论】:

      【解决方案2】:

      您可能会找到一个使用numpy.savetxt 的不错的解决方案,并且可能有一个比您的更简单的pandas 解决方案,但在这种情况下,使用标准库csvitertools 的解决方案非常简洁:

      In [45]: import csv
      
      In [46]: from itertools import izip_longest   # Use zip_longest in Python 3.
      
      In [47]: rows = izip_longest(array3, array4, array1, array2, fillvalue='')
      
      In [48]: with open("out.csv", "w") as f:
         ....:     csv.writer(f).writerows(rows)
         ....:     
      
      In [49]: !cat out.csv
      7.0,6.0,1.0,12.0
      10.947368421052632,8.5789473684210531,3.1111111111111112,36.222222222222221
      14.894736842105264,11.157894736842106,5.2222222222222223,60.444444444444443
      18.842105263157894,13.736842105263158,7.3333333333333339,84.666666666666657
      22.789473684210527,16.315789473684212,9.4444444444444446,108.88888888888889
      26.736842105263158,18.894736842105264,11.555555555555555,133.11111111111111
      30.684210526315788,21.473684210526315,13.666666666666668,157.33333333333331
      34.631578947368425,24.05263157894737,15.777777777777779,181.55555555555554
      38.578947368421055,26.631578947368421,17.888888888888889,205.77777777777777
      42.526315789473685,29.210526315789473,20.0,230.0
      46.473684210526315,31.789473684210527,,
      50.421052631578945,34.368421052631575,,
      54.368421052631575,36.94736842105263,,
      58.315789473684205,39.526315789473685,,
      62.263157894736842,42.10526315789474,,
      66.21052631578948,44.684210526315788,,
      70.15789473684211,47.263157894736842,,
      74.10526315789474,49.842105263157897,,
      78.05263157894737,52.421052631578945,,
      82.0,55.0,,
      

      【讨论】:

        猜你喜欢
        • 2021-09-13
        • 1970-01-01
        • 2016-03-11
        • 2021-09-17
        • 2016-09-30
        • 1970-01-01
        • 1970-01-01
        • 2012-02-07
        • 2019-12-02
        相关资源
        最近更新 更多