在python中将数据框从垂直重塑为水平答案

【问题标题】：Reshaping dataframe from vertical to horizontal in python在python中将数据框从垂直重塑为水平
【发布时间】：2018-07-31 17:10:11
【问题描述】：

我有来自数据库的数据样本，我想在 python 中将其从垂直重塑为水平以进行进一步的数据分析。数据框如下所示：

ID  measured_at  weight 
aa  2017-11-04   78.1
bb  2018-04-08   74.2
bb  2018-04-16   73.2
bb  2018-04-28   72.1
cc  2018-03-02   90.2
cc  2018-03-20   88.9

我希望它看起来像这样：

id  date1       weight1  date2       weight2  date3       weight3
aa  2017-11-04  78.1     NA          NA       NA          NA
bb  2018-04-08  74.2     2018-04-16  73.2     2018-04-28  72.1
cc  2018-03-02  90.2     2018-03-20  88.9     NA          NA

一些 id 的测量值超过 3 个，因此它需要为同一 id 上的每个新测量值生成新的日期和重量列。

这是怎么做到的？

【问题讨论】：

标签： python pandas dataframe data-analysis

【解决方案1】：

您可以多次使用GroupBy 和list，然后使用pd.concat。

我将重命名列和将索引提升到列作为练习。

g = df.groupby('ID')
df_dates = pd.DataFrame(g['measured_at'].apply(list).values.tolist(), index=g.groups)
df_weights = pd.DataFrame(g['weight'].apply(list).values.tolist(), index=g.groups)

df_dates.columns = df_dates.columns * 2
df_weights.columns = df_weights.columns * 2 + 1

res = pd.concat([df_dates, df_weights], axis=1).sort_index(1)

print(res)

             0     1           2     3           4     5
aa  2017-11-04  78.1        None   NaN        None   NaN
bb  2018-04-08  74.2  2018-04-16  73.2  2018-04-28  72.1
cc  2018-03-02  90.2  2018-03-20  88.9        None   NaN

【讨论】：

【解决方案2】：

IIUC 使用cumcount 创建帮助键。

df['helpkey']=df.groupby('ID').cumcount()+1
newdf=df.set_index(['ID','helpkey']).unstack().sort_index(level=1,axis=1)
newdf.columns=newdf.columns.map('{0[0]}_{0[1]}'.format)
newdf
Out[608]: 
   measured_at_1  weight_1 measured_at_2  weight_2 measured_at_3  weight_3
ID                                                                        
aa    2017-11-04      78.1          None       NaN          None       NaN
bb    2018-04-08      74.2    2018-04-16      73.2    2018-04-28      72.1
cc    2018-03-02      90.2    2018-03-20      88.9          None       NaN

【讨论】：

【解决方案3】：

将groupby 与nth 一起使用：

d2 = df.groupby('ID')
new_df = pd.DataFrame()
for i in range(len(d2)):
    new_df = pd.concat([new_df, d2.nth(i).add_suffix(i+1)], axis=1)

给了

   measured_at1  weight1 measured_at2  weight2 measured_at3  weight3
aa   2017-11-04     78.1          NaN      NaN          NaN      NaN
bb   2018-04-08     74.2   2018-04-16     73.2   2018-04-28     72.1
cc   2018-03-02     90.2   2018-03-20     88.9          NaN      NaN

【讨论】：