【问题标题】:Calculation of values ​in columns and indexes in multiindex pandas pivot计算多索引 pandas pivot 中的列和索引中的值
【发布时间】:2020-04-13 19:53:42
【问题描述】:

我的 multiindex pandas pivot df 如下所示:

 Date         2019-10-01 11:00     2019-10-01  12:00     2019-10-01  13:00      ...     2019-10-29 17:00 
   ID              25                     24                    25              ...              24
H_name                                                         
Hospital1          12                     15                    16              ...              12                                                              
Hospital2          10                     17                    14              ...              12 
Hospital3          15                     20                    12              ...              12 

我想得到:

   Date         2019-10-01               2019-10-02           2019-10-03     
   ID             25.45                   24.33                 23.71             
H_name                                                         
Hospital1          253                     287                   261                                                                         
Hospital2          212                     232                   264            
Hospital3          221                     219                   223

“H_name”的值是一天中所有小时的总和,“ID”是一天中所有小时的平均值。谢谢你的帮助=)

透视前的我的df

        H_name            Date              ID      Value  
0     Hospital1     2019-10-01  11:00       25        12
1     Hospital2     2019-10-01  11:00       25        10
2     Hospital3     2019-10-01  11:00       25        15
3     Hospital1     2019-10-01  12:00       24        15
4     Hospital2     2019-10-01  12:00       24        17
5     Hospital3     2019-10-01  12:00       24        20
        ....              ....              ...       ...
680   Hospital1     2019-10-30  15:00       20        11
681   Hospital2     2019-10-30  15:00       20        18
682   Hospital3     2019-10-30  15:00       20        17

【问题讨论】:

    标签: python pandas dataframe pivot multi-index


    【解决方案1】:

    如果我理解正确,您想按日期对数据进行分组(Value by np.sumID by np.mean),然后制作数据透视表:

    import numpy as np
    import pandas as pd
    
    h_name = ['Hospital1', 'Hospital2', 'Hospital3', 'Hospital1', 'Hospital2', 'Hospital3',
              'Hospital1', 'Hospital2', 'Hospital3', 'Hospital1', 'Hospital2', 'Hospital3']
    
    date = ['2019-10-01  11:00', '2019-10-01  11:00', '2019-10-01  11:00', '2019-10-01  12:00', '2019-10-01  12:00', '2019-10-01  12:00',
            '2019-10-02  11:00', '2019-10-02  11:00', '2019-10-02  11:00', '2019-10-02  12:00', '2019-10-02  12:00', '2019-10-02  12:00']
    
    ids = [25, 25, 25, 24, 24, 24,
           23, 23, 23, 22, 22, 22]
    
    value = [12, 10, 15, 15, 17, 20,
             15, 16, 17, 14, 13, 22]
    
    df = pd.DataFrame({'H_name': h_name, 'Date': date, 'ID': ids, 'Value': value})
    df['Date'] = pd.to_datetime(df['Date'], utc=False)
    print(df)
    

    df 中的数据如下:

           H_name                Date  ID  Value
    0   Hospital1 2019-10-01 11:00:00  25     12
    1   Hospital2 2019-10-01 11:00:00  25     10
    2   Hospital3 2019-10-01 11:00:00  25     15
    3   Hospital1 2019-10-01 12:00:00  24     15
    4   Hospital2 2019-10-01 12:00:00  24     17
    5   Hospital3 2019-10-01 12:00:00  24     20
    6   Hospital1 2019-10-02 11:00:00  23     15
    7   Hospital2 2019-10-02 11:00:00  23     16
    8   Hospital3 2019-10-02 11:00:00  23     17
    9   Hospital1 2019-10-02 12:00:00  22     14
    10  Hospital2 2019-10-02 12:00:00  22     13
    11  Hospital3 2019-10-02 12:00:00  22     22
    

    然后:

    df['Date_1'] = df.Date.dt.date
    
    df = df.set_index('H_name').groupby(['H_name', 'Date_1']).agg({'ID':np.mean, 'Value':np.sum})
    print(df.pivot_table(index='H_name', columns=['Date_1', 'ID'], values='Value'))
    

    打印:

    Date_1    2019-10-01 2019-10-02
    ID              24.5       22.5
    H_name                         
    Hospital1         27         29
    Hospital2         27         29
    Hospital3         35         39
    

    【讨论】:

    • 感谢您的回复。这正是我所说的那种事情。但是当我的df['Date']pandas._libs.tslibs.timestamps.Timestamp 而不是str 时,你知道如何规范时间序列吗?
    • @SimiWien 我已经更新了我的答案。你基本上做df['Date_1'] = df.Date.dt.date
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-12-26
    • 1970-01-01
    • 2021-10-18
    • 1970-01-01
    • 2016-09-02
    • 2021-12-20
    • 1970-01-01
    相关资源
    最近更新 更多