【问题标题】:Saving and Loading a Multindexed Pandas Dataframe and retaining the Column Structure保存和加载多索引 Pandas 数据框并保留列结构
【发布时间】:2019-02-06 19:02:08
【问题描述】:

我找不到正确保存和检索多索引熊猫数据框的方法,以便保留多索引列结构。对于一个可重现的例子:

toy_data.to_json()
'{"["GOOG","Shares"]":{"1521849600000":null,"1521936000000":null,"1522368000000":null,"1522454400000":694548763.0,"1522540800000":null},"["GOOG","ROE"]":{"1521849600000":null,"1521936000000":null,"1522368000000":null,"1522454400000":0.1076,"1522540800000":null},"["FB","Shares"]":{"1521849600000":null,"1521936000000":null,"1522368000000":null,"1522454400000":2398606201.0,"1522540800000":null},"["FB","ROE"]":{"1521849600000":null,"1521936000000":null,"1522368000000":null,"1522454400000":0.2465,"1522540800000":null}}'

toy_data.to_csv('toy_data.csv')

toy_data1 = pd.read_csv('toy_data.csv')

【问题讨论】:

标签: python pandas csv multi-index


【解决方案1】:

您尚未提供可用的示例数据,但我相当确定您所需要做的就是将header=[0, 1]index_col=0 作为参数传递给read_csv

【讨论】:

    【解决方案2】:

    read_csv

    read_csv 中使用headerindex_col 参数可以满足您的需求。

    toy_data.to_csv('sample.csv')
    pd.read_csv('sample.csv', header=[0, 1], index_col=[0])
    
    Company       GOOG          FB     
    Indicators  Shares  ROE Shares  ROE
    Quarter_end                        
    2018-03-24     NaN  NaN    NaN  NaN
    2018-03-25     NaN  NaN    NaN  NaN
    2018-03-30     NaN  NaN    NaN  NaN
    2018-03-31     1.0  2.0    3.0  4.0
    2018-04-01     NaN  NaN    NaN  NaN
    

    read_hdf

    保存到hdf 可能是更好的选择。

    toy_data.to_hdf('sample.h5', 'toy_key')
    pd.read_hdf('sample.h5', 'toy_key')
    
    Company       GOOG          FB     
    Indicators  Shares  ROE Shares  ROE
    Quarter_end                        
    2018-03-24     NaN  NaN    NaN  NaN
    2018-03-25     NaN  NaN    NaN  NaN
    2018-03-30     NaN  NaN    NaN  NaN
    2018-03-31     1.0  2.0    3.0  4.0
    2018-04-01     NaN  NaN    NaN  NaN
    

    设置

    cols = pd.MultiIndex.from_product(
        [['GOOG', 'FB'], ['Shares', 'ROE']],
        names=['Company', 'Indicators']
    )
    idx = pd.to_datetime(
        ['2018-03-24', '2018-03-25', '2018-03-30',
         '2018-03-31', '2018-04-01']
    ).rename('Quarter_end')
    
    toy_data = pd.DataFrame([
        [np.nan, np.nan, np.nan, np.nan],
        [np.nan, np.nan, np.nan, np.nan],
        [np.nan, np.nan, np.nan, np.nan],
        [1, 2, 3, 4],
        [np.nan, np.nan, np.nan, np.nan],
    ], idx, cols)
    

    【讨论】:

    • 好答案。很高兴知道hdf 原生支持pandas 中的读/写多索引。
    猜你喜欢
    • 2013-12-03
    • 2019-07-18
    • 2014-09-01
    • 2020-06-02
    • 2018-05-02
    • 2018-08-23
    • 2012-11-14
    • 1970-01-01
    • 2020-07-10
    相关资源
    最近更新 更多