【问题标题】:Python Pandas DataFrame how to PivotPython Pandas DataFrame 如何旋转
【发布时间】:2014-09-30 17:43:26
【问题描述】:

亲爱的世界上最了不起的黑客们,

我是一个新手,不知道哪个python/pandas函数可以实现我想要的“转换”。告诉你我有什么(“原创”)和我想要什么样的结果(“期望”)比冗长的描述(我认为和希望)更好。

import pandas as pd

原始数据帧输入

df_orig = pd.DataFrame()
df_orig["Treatment"] = ["C", "C", "D", "D", "C", "C", "D", "D"]
df_orig["TimePoint"] = [24, 48, 24, 48, 24, 48, 24, 48]
df_orig["AN"] = ["ALF234","ALF234","ALF234","ALF234","XYK987","XYK987","XYK987","XYK987"]
df_orig["Bincode"] = [33,33,33,33,44,44,44,44]
df_orig["BC_all"] = ["33.7","33.7","33.7","33.7","44.9","44.9","44.9","44.9"]
df_orig["RIA_avg"] = [0.202562419159333,0.281521224788666, 0.182828319454333,0.294909088002333,
                  0.105941322218833,0.247949961707,0.1267545610749,0.159711714967666]
df_orig["sum14N_avg"] = [4120031.79121666,3742633.37033333,4659315.47073666,4345668.76408666,
                     26307312.1188333,24089229.9177999,35367286.7322666,34093045.3129]

显示原始数据帧

所需的 DataFrame 输入,

df_wanted = pd.DataFrame()
df_wanted["AN"] = ["ALF234","XYK987"]
df_wanted["Bincode"] = [33,44]
df_wanted["BC_all"] = ["33.7","44.9"]
df_wanted["C_24_RIA_avg"] = [0.202562419159333, 0.105941322218833]
df_wanted["C_48_RIA_avg"] = [0.281521224788666,0.247949961707]
df_wanted["D_24_RIA_avg"] = [0.182828319454333,0.1267545610749]
df_wanted["D_48_RIA_avg"] = [0.294909088002333, 0.159711714967666]
df_wanted["C_24_sum14N_avg"] = [4120031.791, 26307312.12]
df_wanted["C_48_sum14N_avg"] = [3742633.37, 24089229.92]
df_wanted["D_24_sum14N_avg"] = [4659315.471, 35367286.73]
df_wanted["D_48_sum14N_avg"] = [4345668.764, 34093045.31]

显示所需的数据帧

非常感谢您的支持!!

【问题讨论】:

    标签: python pandas pivot transform dataframe


    【解决方案1】:

    我相信您想使用pd.pivot_table 来调整它。请参阅 the examples on pivot tables 以更好地了解其工作原理。

    以下内容应该可以满足您的需求。

    df_wanted = pd.pivot_table(
        df_orig, 
        index=['AN', 'Bincode', 'BC_all'], 
        columns=['Treatment', 'Timepoint'], 
        values=['RIA_avg', 'sum14N_avg']
    )
    

    请注意,列名不会完全按照您在输出中说明的那样进行转换,而是在列和行上都有一个分层索引,这应该更方便使用。

    使用.loc 可以从此格式中获取行/列/值:

    df_wanted.loc['XYK987', :]
    df_wanted.loc[:, ('sum14N_avg')]
    df_wanted.loc['ALF234', ('RIA_avg', 'C', 24)]
    

    【讨论】:

      【解决方案2】:

      您的输出未正确对齐,因此很难理解。但它看起来像是df.groupby('AN').mean() 或类似的工作。阅读有关 Group By 的文档。

      【讨论】:

        猜你喜欢
        • 2022-01-03
        • 2018-03-24
        • 1970-01-01
        • 1970-01-01
        • 2019-03-19
        • 2015-07-26
        • 2017-08-08
        • 1970-01-01
        相关资源
        最近更新 更多