【问题标题】:Resampling a pandas MultiIndex dataframe重新采样 pandas MultiIndex 数据帧
【发布时间】:2017-09-19 06:39:30
【问题描述】:

我有一个类似于以下内容的 pandas MultiIndex 数据框:

import pandas as pd

rows = [('One', 'One', 'One', '20120105', 1, 'Text1'),
        ('One', 'One', 'One', '20120107', 2, 'Text2'),
        ('One', 'One', 'One', '20120110', 3, 'Text3'),
        ('One', 'One', 'Two', '20120104', 4, 'Text4'),
        ('One', 'Two', 'One', '20120109', 5, 'Text5'),
        ('Two', 'Three', 'Four', '20120111', 6, 'Text6')]
cols = ['Type', 'Subtype', 'Subsubtype', 'Date', 'Number', 'Text']
df = pd.DataFrame.from_records(rows, columns=cols)
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index(['Type', 'Subtype', 'Subsubtype'])
end_date = max(df['Date'])
print(df)

                              Date  Number   Text
Type Subtype Subsubtype                          
One  One     One        2012-01-05       1  Text1
             One        2012-01-07       2  Text2
             One        2012-01-10       3  Text3
             Two        2012-01-04       4  Text4
     Two     One        2012-01-09       5  Text5
Two  Three   Four       2012-01-11       6  Text6

我想对数据进行上采样,以便类型-子类型-子子类型索引的每个组合都获取每日日期数据:从数据可用的最小日期到 end_date = max(df['Date'])。

我想要的一个例子:

                              Date  Number   Text
Type Subtype Subsubtype                          
One  One     One        2012-01-05       1  Text1
             One        2012-01-06       1  Text2
             One        2012-01-07       2  Text2
             One        2012-01-08       2  Text2
             One        2012-01-09       2  Text2
             One        2012-01-10       3  Text3
             One        2012-01-11       3  Text3
             Two        2012-01-04       4  Text4
             Two        2012-01-05       4  Text4
             Two        2012-01-06       4  Text4
             Two        2012-01-07       4  Text4
             Two        2012-01-08       4  Text4
             Two        2012-01-09       4  Text4
             Two        2012-01-10       4  Text4
             Two        2012-01-11       4  Text4
     Two     One        2012-01-09       5  Text5
             One        2012-01-10       5  Text5
             One        2012-01-11       5  Text5
Two  Three   Four       2012-01-11       6  Text6

查看类似的问题,我找不到任何可以解决的问题。非常感谢任何帮助。

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    你可以使用:


    df = df.groupby(level=[0,1,2]) \
           .apply(lambda x: x.set_index('Date').reindex(pd.date_range(x['Date'].iat[0], 
                                                                      end_date))).ffill()
    print (df)
                                        Number   Text
    Type Subtype Subsubtype                          
    One  One     One        2012-01-05     1.0  Text1
                            2012-01-06     1.0  Text1
                            2012-01-07     2.0  Text2
                            2012-01-08     2.0  Text2
                            2012-01-09     2.0  Text2
                            2012-01-10     3.0  Text3
                            2012-01-11     3.0  Text3
                 Two        2012-01-04     4.0  Text4
                            2012-01-05     4.0  Text4
                            2012-01-06     4.0  Text4
                            2012-01-07     4.0  Text4
                            2012-01-08     4.0  Text4
                            2012-01-09     4.0  Text4
                            2012-01-10     4.0  Text4
                            2012-01-11     4.0  Text4
         Two     One        2012-01-09     5.0  Text5
                            2012-01-10     5.0  Text5
                            2012-01-11     5.0  Text5
    Two  Three   Four       2012-01-11     6.0  Text6
    

    【讨论】:

    • 谢谢!效果很好!唯一需要注意的是,列名“日期”丢失了。
    • 您可以使用df.index.names=['Type','Subtype','Subsubtype','Date'],谢谢。
    猜你喜欢
    • 2013-03-25
    • 1970-01-01
    • 1970-01-01
    • 2013-11-22
    • 2021-08-21
    • 1970-01-01
    • 2018-02-01
    • 2020-11-29
    • 1970-01-01
    相关资源
    最近更新 更多