【问题标题】:How to store list of Pandas data frame for easy access如何存储 Pandas 数据框列表以便于访问
【发布时间】:2016-06-28 08:50:07
【问题描述】:

我有一个数据框列表,

df1 = 
    Stock  Year   Profit  CountPercent
     AAPL  2012    1       38.77
     AAPL  2013    1       33.33
df2 = 
    Stock  Year   Profit  CountPercent
    GOOG   2012    1       43.47
    GOOG   2013    1       32.35

df3 = 
    Stock  Year   Profit  CountPercent
    ABC   2012    1       40.00
    ABC   2013    1       32.35

函数的输出是 [df1,df2,df3,......] 这样的, 数据框中的所有列都相同,但行不同,

我如何将这些存储在硬盘中并以最快速有效的方式再次检索为列表?

【问题讨论】:

  • 所有 DF 是否都具有相同的形状(行数和列数)?

标签: python list pandas dataframe


【解决方案1】:

如果Stock列中的值相同,则可以通过iloc删除该列并使用dict comprehension(键是每个StockStock列的第一个值@):

dfs = {df.ix[0,'Stock']: df.iloc[:, 1:] for df in [df1,df2,df3]}

print (dfs['AAPL'])
   Year  Profit  CountPercent
0  2012       1         38.77
1  2013       1         33.33

print (dfs['ABC'])
   Year  Profit  CountPercent
0  2012       1         40.00
1  2013       1         32.35

print (dfs['GOOG'])
   Year  Profit  CountPercent
0  2012       1         43.47
1  2013       1         32.35

对于存储在disk 我认为最好是使用hdf5 pytables

如果每个Stack列中的值相同,则可以concat全部df然后存储:

df = pd.concat([df1.set_index('Stock'), df2.set_index('Stock'), df3.set_index('Stock')])
print (df)
       Year  Profit  CountPercent
Stock                            
AAPL   2012       1         38.77
AAPL   2013       1         33.33
GOOG   2012       1         43.47
GOOG   2013       1         32.35
ABC    2012       1         40.00
ABC    2013       1         32.35

store = pd.HDFStore('store.h5')
store['df'] = df
print (store)
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[1,4])

【讨论】:

    【解决方案2】:

    我认为,如果您所有的 DF 都具有相同的形状,那么将您的数据存储为 pandas.Panel 而不是 DF 列表会更自然 - 这就是 pandas_datareader 的工作方式

    import io
    import pandas as pd
    
    df1 = pd.read_csv(io.StringIO("""
    Stock,Year,Profit,CountPercent
    AAPL,2012,1,38.77
    AAPL,2013,1,33.33
    """
    ))
    
    df2 = pd.read_csv(io.StringIO("""
    Stock,Year,Profit,CountPercent
    GOOG,2012,1,43.47
    GOOG,2013,1,32.35
    """
    ))
    
    df3 = pd.read_csv(io.StringIO("""
    Stock,Year,Profit,CountPercent
    ABC,2012,1,40.0
    ABC,2013,1,32.35
    """
    ))
    
    
    store = pd.HDFStore('c:/temp/stocks.h5')
    
    # i had to drop `Stock` column and make it Panel-Axis, because of ERROR:
    # TypeError: Cannot serialize the column [%s] because its data contents are [mixed-integer] object dtype
    # when saving Panel to HDFStore ... 
    p = pd.Panel({df.iat[0, 0]:df.drop('Stock', 1) for df in [df1,df2,df3]})
    
    store = pd.HDFStore('c:/temp/stocks.h5')
    store.append('stocks', p, data_columns=True, mode='w')
    store.close()
    
    # read panel from HDFStore
    store = pd.HDFStore('c:/temp/stocks.h5')
    p = store.select('stocks')
    

    商店:

    In [18]: store
    Out[18]:
    <class 'pandas.io.pytables.HDFStore'>
    File path: c:/temp/stocks.h5
    /stocks            wide_table   (typ->appendable,nrows->6,ncols->3,indexers->[major_axis,minor_axis],dc->[AAPL,ABC,GOOG])
    

    面板尺寸:

    In [19]: p['AAPL']
    Out[19]:
         Year  Profit  CountPercent
    0  2012.0     1.0         38.77
    1  2013.0     1.0         33.33
    
    In [20]: p[:, :, 'Profit']
    Out[20]:
       AAPL  ABC  GOOG
    0   1.0  1.0   1.0
    1   1.0  1.0   1.0
    
    In [21]: p[:, 0]
    Out[21]:
                     AAPL     ABC     GOOG
    Year          2012.00  2012.0  2012.00
    Profit           1.00     1.0     1.00
    CountPercent    38.77    40.0    43.47
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2016-11-03
      • 1970-01-01
      • 2016-08-16
      • 2021-09-14
      • 2018-02-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多