【问题标题】:Groupby pandas df and create a colum with nested dictionaryGroupby pandas df并创建一个带有嵌套字典的列
【发布时间】:2022-01-22 08:48:32
【问题描述】:

鉴于这个df:

        dim_date_id closing_type    r_d variable    value   rolling cusum_sample    sample_type
1330    1995-10-27      low         1     low      9.699377  0.039688   1   [sh_dummy_0.5, sh_dummy_1]
1331    1995-10-27      low         1    close    10.340971  0.044784   1   [sh_dummy_0.5, sh_dummy_1]
1330    1995-10-27      high        1    high     10.529675  0.062868   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331    1995-10-27      high        1    close    10.340971  0.044784   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330    1995-10-27      low         5     low      9.699377  0.132976   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331    1995-10-27      low         5   close     10.340971  0.188179   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330    1995-10-27      high        5    high     10.529675  0.184475   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

我想根据variable 对其进行分组,并在colum 样本类型中创建一个嵌套字典(或者我并不关心的另一个字典)。作为输出,我想要一个看起来像这样的df

       dim_date_id      variable   value      sample_type
1330    1995-10-27       low      9.699377     {'r_d':1,'closing_type':'low','rolling':0.039688,'sample':[sh_dummy_0.5, sh_dummy_1]},
                                           {'r_d':5,'closing_type':'low','rolling':0.132976,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

1331    1995-10-27      close    10.340971  {'r_d':1,'closing_type':'low','rolling':0.044784,'sample':[sh_dummy_0.5, sh_dummy_1]},
                                         {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2], 
                                         {'r_d':5,'closing_type':'low','rolling':0.188179,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],

1330    1995-10-27      high     10.529675    {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]},
                                           {'r_d':5,'closing_type':'high','rolling':0.184475,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

它必须尽可能灵活,因为在 sample_type 列中有时还可以有“n”个不同的变量。

【问题讨论】:

    标签: python pandas dataframe nested pandas-groupby


    【解决方案1】:

    试试这个:

    new_df = df.groupby(['dim_date_id','variable','value']).apply(lambda x: x.to_dict()).reset_index(name='sample_type')
    

    输出:

    >>> new_df
      dim_date_id variable      value                                        sample_type
    0  1995-10-27    close  10.340971  {'dim_date_id': {1331: '1995-10-27'}, 'closing...
    1  1995-10-27     high  10.529675  {'dim_date_id': {1330: '1995-10-27'}, 'closing...
    2  1995-10-27      low   9.699377  {'dim_date_id': {1330: '1995-10-27'}, 'closing...
    

    【讨论】:

    • 几乎没问题,但是在列'sample_type'中它嵌套了太多参数,而且索引不会被重置并且相同(1330、1331和1330)非常重要。谢谢!
    猜你喜欢
    • 2016-02-14
    • 1970-01-01
    • 2022-11-01
    • 2022-01-19
    • 2023-04-02
    • 1970-01-01
    • 2021-07-11
    • 2020-02-18
    • 1970-01-01
    相关资源
    最近更新 更多