如何将python数据框中前几行的值转换为新列答案

【问题标题】：How to transpose values from top few rows in python dataframe into new columns如何将python数据框中前几行的值转换为新列
【发布时间】：2021-08-10 14:36:30
【问题描述】：

我正在尝试从 python 排序数据框中每个组的前 3 条记录中选择值并将它们放入新列中。我有一个处理每个组的函数，但我很难找到正确的方法来提取、重命名系列，然后将结果组合为一个系列以返回。

以下是输入数据帧 (df_in) 和预期输出 (df_out) 的简化示例：

import pandas as pd
data_in = { 'Product': ['A', 'A', 'A', 'A', 'B', 'C', 'C'],
        'Price':  [25.0, 30.5, 50.0, 61.5, 120.0, 650.0, 680.0], 
        'Qty': [15 , 13, 14, 10, 5, 2, 1]}
df_in = pd.DataFrame (data_in, columns = ['Product', 'Price', 'Qty'])

我正在复制以下 2 个我测试过的函数示例，并试图获得一个更有效的选项，尤其是在我必须处理更多列和记录的情况下。函数 best3_prices_v1 有效，但必须明确指定每个列或变量，这尤其是一个问题，因为我必须添加更多列。

def best3_prices_v1(x):     
   d = {}

# get best 3 records if records available, else set volumes as zeroes   
best_price_lv1 = x.iloc[0].copy()

rec_with_zeroes = best_price_lv1.copy()
rec_with_zeroes['Price'] = 0
rec_with_zeroes['Qty'] = 0

recs = len(x) # number of records
if (recs == 1): 
    # 2nd and 3rd records not available
    best_price_lv2 = rec_with_zeroes.copy()
    best_price_lv3 = rec_with_zeroes.copy()
elif (recs == 2):        
    best_price_lv2 = x.iloc[1]
    # 3rd record not available
    best_price_lv3 = rec_with_zeroes.copy()
else:
    best_price_lv2 = x.iloc[1]
    best_price_lv3 = x.iloc[2]    

# 1st best
d['Price_1'] = best_price_lv1['Price'] 
d['Qty_1'] = best_price_lv1['Qty'] 

# 2nd best
d['Price_2'] = best_price_lv2['Price'] 
d['Qty_2'] = best_price_lv2['Qty'] 

# 3rd best
d['Price_3'] = best_price_lv3['Price'] 
d['Qty_3'] = best_price_lv3['Qty'] 

# return combined results as a series
return pd.Series(d, index=['Price_1', 'Qty_1', 'Price_2', 'Qty_2', 'Price_3', 'Qty_3'])

调用函数的代码：

# sort dataframe by Product and Price
df_in.sort_values(by=['Product', 'Price'], ascending=True, inplace=True)
# get best 3 prices and qty as new columns
df_out = df_in.groupby(['Product']).apply(best3_prices_v1).reset_index()

第二次尝试改进/减少每个变量的代码和显式名称......不完整且不起作用。

def best3_prices_v2(x):     
d = {}

# get best 3 records if records available, else set volumes as zeroes   
best_price_lv1 = x.iloc[0].copy()

rec_with_zeroes = best_price_lv1.copy()
rec_with_zeroes['Price'] = 0
rec_with_zeroes['Qty'] = 0

recs = len(x) # number of records
if (recs == 1): 
    # 2nd and 3rd records not available
    best_price_lv2 = rec_with_zeroes.copy()
    best_price_lv3 = rec_with_zeroes.copy()
elif (recs == 2):        
    best_price_lv2 = x.iloc[1]
    # 3rd record not available
    best_price_lv3 = rec_with_zeroes.copy()
else:
    best_price_lv2 = x.iloc[1]
    best_price_lv3 = x.iloc[2]   


stats_columns = ['Price', 'Qty']

 # get records values for best 3 prices
d_lv1 = best_price_lv1[stats_columns]
d_lv2 = best_price_lv2[stats_columns] 
d_lv3 = best_price_lv3[stats_columns] 

# How to rename (keys?) or combine values to return?
lv1_stats_columns = [c + '_1' for c in stats_columns]
lv2_stats_columns = [c + '_2' for c in stats_columns]
lv3_stats_columns = [c + '_3' for c in stats_columns]
    
# return combined results as a series
return pd.Series(d, index=lv1_stats_columns + lv2_stats_columns + lv3_stats_columns)

【问题讨论】：

标签： python pandas dataframe pandas-groupby series

【解决方案1】：

让我们unstack():

df_in=(df_in.set_index([df_in.groupby('Product').cumcount().add(1),'Product'])
             .unstack(0,fill_value=0))
df_in.columns=[f"{x}_{y}" for x,y in df_in]
df_in=df_in.reset_index()

或通过pivot()

df_in=(df_in.assign(key=df_in.groupby('Product').cumcount().add(1))
      .pivot('Product','key',['Price','Qty'])
      .fillna(0,downcast='infer'))
df_in.columns=[f"{x}_{y}" for x,y in df_in]
df_in=df_in.reset_index()

【讨论】：

补充一点，因为 OP 有兴趣只获得每个 Product 的前三个结果（假设按索引），您可以使用 df_in.groupby('Product').head(3) 快速获取该 DataFrame

【解决方案2】：

根据上面@AnuragDabas 的pivot 解决方案和@ceruler 的反馈，我现在可以将其扩展为更普遍的问题。

具有更多组和列的新数据框：

data_in = { 'Product': ['A', 'A', 'A', 'A', 'B', 'C', 'C'],
       'Model': ['A1', 'A1', 'A1', 'A2', 'B1', 'C1', 'C1'],
    'Price':  [25.0, 30.5, 50.0, 61.5, 120.0, 650.0, 680.0], 
    'Qty': [15 , 13, 14, 10, 5, 2, 1],
    'Ratings': [9, 7, 8, 10, 6, 7, 8 ]}
df_in = pd.DataFrame (data_in, columns = ['Product', 'Model' ,'Price', 'Qty', 'Ratings'])


group_list = ['Product', 'Model']
stats_list = ['Price','Qty', 'Ratings']
df_out = df_in.groupby(group_list).head(3)
df_out=(df_out.assign(key=df_out.groupby(group_list).cumcount().add(1))
  .pivot(group_list,'key', stats_list)
  .fillna(0,downcast='infer'))
df_out.columns=[f"{x}_{y}" for x,y in df_out]
df_out = df_out.reset_index()

【讨论】：