【问题标题】:Extracting Pandas multiindex from dataframe with NaT使用 NaT 从数据框中提取 Pandas 多索引
【发布时间】:2016-06-07 17:30:19
【问题描述】:

我正在使用 pandas 来解析 Excel 电子表格。该电子表格有几个工作表,每个工作表如下所示。请注意,每一列都有对应于不同日期的值,并且具有不同的长度:

无论出于何种原因,当 pandas 解析 Excel 电子表格时,第一个工作表会将日期的第一列解析为索引(即使 index_col 参数已指定为 None)。这仍然是可以管理的。

但是,在其他工作表中,它将索引解析为多索引:

我想要做的是最终重建数据框,以便它们都共享一个共同的日期索引,并且对于任何没有值的日期都用 NaN 填充。但是,我似乎无法从多索引中提取日期来开始这个过程。

我尝试在级别 0 和级别 1 的数据帧上执行 reset_index(),但它抱怨 IndexError: cannot do a non-empty take from an empty axes. 我也尝试过 unstack(),但抱怨 ValueError: Index contains duplicate entries, cannot reshape

【问题讨论】:

    标签: python excel pandas dataframe multi-index


    【解决方案1】:

    我认为您使用 read_excel 和参数 parse_colsheaderindex_col。然后通过iloc 和最后concat 从每一对创建DataFrames:

    import pandas as pd
    
    df = pd.read_excel('f_name.xlsx', parse_cols=[0, 1, 3, 4, 7 , 8], index_col=0, header=0)
    #if you need reset NaT in index, but it is not necessary
    #df.index = df.index.to_series().fillna(0)
    print df
                Column_val1 Unnamed: 1  Column_val2 Unnamed: 3  Column_val3
    1999-01-01            4 2000-01-01            5 2000-01-01            5
    1999-01-02            1 2000-01-02            7 2000-01-02            7
    1999-01-03            2 2000-01-03            8 2000-01-03            8
    1999-01-04            3 2000-01-04            3 2000-01-04            3
    1999-01-05            3 2000-01-05            6 2000-01-05            6
    1999-01-06            3 2000-01-06            9 2000-01-06            9
    1999-01-07            4 2000-01-07            1 2000-01-07            1
    1999-01-08            6 2000-01-08            5 2000-01-08            5
    1999-01-09            8 2000-01-09            2 2000-01-09            2
    1999-01-10            2 2000-01-10            3 2000-01-10            3
    1999-01-11            4 2000-01-11           47 2000-01-11           47
    1999-01-12            5 2000-01-12            2 2000-01-12            2
    NaT                 NaN 2000-01-13            8 2000-01-13            8
    NaT                 NaN 2000-01-14            2 2000-01-14            2
    NaT                 NaN 2000-01-15           87 2000-01-15           87
    NaT                 NaN 2000-01-16            6 2000-01-16            6
    NaT                 NaN 2000-01-17           89 2000-01-17           89
    NaT                 NaN        NaT          NaN 2000-01-18            7
    NaT                 NaN        NaT          NaN 2000-01-19            8
    
    print df['Column_val1']
    1999-01-01     4
    1999-01-02     1
    1999-01-03     2
    1999-01-04     3
    1999-01-05     3
    1999-01-06     3
    1999-01-07     4
    1999-01-08     6
    1999-01-09     8
    1999-01-10     2
    1999-01-11     4
    1999-01-12     5
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    Name: Column_val1, dtype: float64
    
    print df.set_index(df.iloc[:, 1])['Column_val2']
    Unnamed: 1
    2000-01-01     5
    2000-01-02     7
    2000-01-03     8
    2000-01-04     3
    2000-01-05     6
    2000-01-06     9
    2000-01-07     1
    2000-01-08     5
    2000-01-09     2
    2000-01-10     3
    2000-01-11    47
    2000-01-12     2
    2000-01-13     8
    2000-01-14     2
    2000-01-15    87
    2000-01-16     6
    2000-01-17    89
    NaT          NaN
    NaT          NaN
    Name: Column_val2, dtype: float64
    
    print df.set_index(df.iloc[:, 3])['Column_val3']
    Unnamed: 3
    2000-01-01     5
    2000-01-02     7
    2000-01-03     8
    2000-01-04     3
    2000-01-05     6
    2000-01-06     9
    2000-01-07     1
    2000-01-08     5
    2000-01-09     2
    2000-01-10     3
    2000-01-11    47
    2000-01-12     2
    2000-01-13     8
    2000-01-14     2
    2000-01-15    87
    2000-01-16     6
    2000-01-17    89
    2000-01-18     7
    2000-01-19     8
    Name: Column_val3, dtype: int64
    
    df = pd.concat([df['Column_val1'], 
                    df.set_index(df.iloc[:, 1])['Column_val2'], 
                    df.set_index(df.iloc[:, 3])['Column_val3'] ])
    
    #better is use sort index
    df = df.sort_index()
    print df
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    NaT          NaN
    1999-01-01     4
    1999-01-02     1
    1999-01-03     2
    1999-01-04     3
    1999-01-05     3
    1999-01-06     3
    1999-01-07     4
    1999-01-08     6
    1999-01-09     8
    1999-01-10     2
    1999-01-11     4
    1999-01-12     5
    2000-01-01     5
    2000-01-01     5
    2000-01-02     7
    2000-01-02     7
    2000-01-03     8
    2000-01-03     8
    2000-01-04     3
    2000-01-04     3
    2000-01-05     6
    2000-01-05     6
    2000-01-06     9
    2000-01-06     9
    2000-01-07     1
    2000-01-07     1
    2000-01-08     5
    2000-01-08     5
    2000-01-09     2
    2000-01-09     2
    2000-01-10     3
    2000-01-10     3
    2000-01-11    47
    2000-01-11    47
    2000-01-12     2
    2000-01-12     2
    2000-01-13     8
    2000-01-13     8
    2000-01-14     2
    2000-01-14     2
    2000-01-15    87
    2000-01-15    87
    2000-01-16     6
    2000-01-16     6
    2000-01-17    89
    2000-01-17    89
    2000-01-18     7
    2000-01-19     8
    dtype: float64
    
    #if you need remove rows where index is NaT
    print df[pd.notnull(df.index)]
    1999-01-01     4
    1999-01-02     1
    1999-01-03     2
    1999-01-04     3
    1999-01-05     3
    1999-01-06     3
    1999-01-07     4
    1999-01-08     6
    1999-01-09     8
    1999-01-10     2
    1999-01-11     4
    1999-01-12     5
    2000-01-01     5
    2000-01-01     5
    2000-01-02     7
    2000-01-02     7
    2000-01-03     8
    2000-01-03     8
    2000-01-04     3
    2000-01-04     3
    2000-01-05     6
    2000-01-05     6
    2000-01-06     9
    2000-01-06     9
    2000-01-07     1
    2000-01-07     1
    2000-01-08     5
    2000-01-08     5
    2000-01-09     2
    2000-01-09     2
    2000-01-10     3
    2000-01-10     3
    2000-01-11    47
    2000-01-11    47
    2000-01-12     2
    2000-01-12     2
    2000-01-13     8
    2000-01-13     8
    2000-01-14     2
    2000-01-14     2
    2000-01-15    87
    2000-01-15    87
    2000-01-16     6
    2000-01-16     6
    2000-01-17    89
    2000-01-17    89
    2000-01-18     7
    2000-01-19     8
    dtype: float64
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-09-10
      • 2022-06-29
      • 2020-10-22
      • 2016-12-22
      • 1970-01-01
      • 2017-08-19
      • 1970-01-01
      相关资源
      最近更新 更多