【问题标题】:Creating columns in a pandas dataframe based on a column value in other dataframe根据其他数据框中的列值在熊猫数据框中创建列
【发布时间】:2020-03-21 11:31:51
【问题描述】:

我有两个 pandas 数据框

import pandas as pd 
import numpy as np
import datetime

data = {'group'      :["A","A","B","B"],
        'val': ["AA","AB","B1","B2"],
        'cal1'     :[4,5,7,6],
        'cal2'     :[10,100,100,10]
       } 

df1 = pd.DataFrame(data) 
df1

    group   val    cal1   cal2
0   A       AA     4      10
1   A       AB     5      100
2   B       B1     7      100
3   B       B2     6      10

data = {'group'      :["A","A","A","B","B","B","B", "B", "B", "B"],
        'flag' : [1,0,0,1,0,0,0, 1, 0, 0],
        'var1': [1,2,3,7,8,9,10, 15, 20, 30]
       } 

# Create DataFrame 
df2 = pd.DataFrame(data) 
df2

    group   flag    var1
0   A       1       1
1   A       0       2
2   A       0       3
3   B       1       7
4   B       0       8
5   B       0       9
6   B       0       10
7   B       1       15
8   B       0       20
9   B       0       30

Step 1: CReate columns in df2(with suffix "_new") based on unique "val" in df1 like below:

unique_val = df1['val'].unique().tolist()
new_cols = [t + '_new' for t in unique_val]
for i in new_cols:
    df2[i] = 0
df2
    group   flag    var1    AA_new  AB_new  B1_new  B2_new
0   A       1        1       0      0       0        0
1   A       0        2       0      0       0        0
2   A       0        3       0      0       0        0
3   B       1        7       0      0       0        0
4   B       0        8       0      0       0        0
5   B       0        9       0      0       0        0
6   B       0        10      0      0       0        0
7   B       1        15      0      0       0        0
8   B       0        20      0      0       0        0
9   B       0        30      0      0       0        0

第 2 步:对于 flag = 1 的行,AA_new 将计算为 var1(from df2)* value of 'cal1' from df1 for group "A" and val "AA" * value of 'cal2' from df1 for组“A”和 val“AA”,类似地 AB_new 将计算为 var1(来自 df2)* 组“A”的 df1 的“cal1”值和 val“AB”* 组“的 df1 的“cal2”值A" 和 val "AB"

我的预期输出应该如下所示:

    group   flag    var1    AA_new  AB_new  B1_new   B2_new
0   A       1       1       40.0    500.0   0.0      0.0
1   A       0       2       0.0     0.0     0.0      0.0
2   A       0       3       0.0     0.0     0.0      0.0
3   B       1       7       0.0     0.0     4900.0   420.0
4   B       0       8       0.0     0.0     0.0      0.0
5   B       0       9       0.0     0.0     0.0      0.0
6   B       0       10      0.0     0.0     0.0      0.0
7   B       1       15      0.0     0.0     10500.0  900.0
8   B       0       20      0.0     0.0     0.0      0.0
9   B       0       30      0.0     0.0     0.0      0.0

以下基于其他堆栈流问题的解决方案部分有效:

df2.assign(**df1.assign(mul_cal = df1['cal1'].mul(df1['cal2']))
                .pivot_table(columns='val',
                             values='mul_cal',
                             index = ['group', df2.index])
                .add_suffix('_new')
                .groupby(level=0)
               .apply(lambda x: x.bfill().ffill()) 
                .reset_index(level='group',drop='group')
                .fillna(0)
                .mul(df2['var1'], axis=0)
                .where(df2['flag'].eq(1), 0)
)

【问题讨论】:

    标签: python-3.x pandas


    【解决方案1】:

    灵活的列

    如果您希望我们在 df1 中再添加几行时这样做,您可以这样做。

    combinations = df1.groupby(['group','val'])['cal3'].sum().reset_index()
    
    for index_, row_ in combinations.iterrows():
        for index, row in df2.iterrows():
            if row['flag'] == 1:
                if row['group'] == row_['group']:
                    df2.loc[index, row_['val'] + '_new'] = row['var1'] * df1[(df1['group'] == row_['group']) & (df1['val'] == row_['val'])]['cal3'].values[0]
    

    硬编码

    您可以对数据框使用迭代并在每次迭代中更改其特定列,您可以这样做(但您需要先将新列添加到您的 df1 中)。

    df1['cal3'] = df1['cal1'] * df1['cal2']
    
    for index, row in df2.iterrows():
        if row['flag'] == 1:
            if row['group'] == 'A':
                df2.loc[index, 'AA_new'] = row['var1'] * df1[(df1['group'] == 'A') & (df1['val'] == 'AA')]['cal3'].values[0]
                df2.loc[index, 'AB_new'] = row['var1'] * df1[(df1['group'] == 'A') & (df1['val'] == 'AB')]['cal3'].values[0]
    
            elif row['group'] == 'B':
                df2.loc[index, 'B1_new'] = row['var1'] * df1[(df1['group'] == 'B') & (df1['val'] == 'B1')]['cal3'].values[0]
                df2.loc[index, 'B2_new'] = row['var1'] * df1[(df1['group'] == 'B') & (df1['val'] == 'B2')]['cal3'].values[0]
    

    这是我得到的结果。

    【讨论】:

    • 这个解决方案是基于 'group' 和 'val' 的硬编码,例如如果 row['group'] == 'A'。如果“val”列中有除“A”和“B”之外的更多组值和更多类别怎么办。有没有办法在没有硬编码组和 val 的情况下映射两个数据帧?
    • 如果您希望它灵活一些,我添加了一些代码,如果您想要的话,请告诉我
    猜你喜欢
    • 2019-12-09
    • 1970-01-01
    • 2019-07-25
    • 1970-01-01
    • 1970-01-01
    • 2020-10-21
    • 1970-01-01
    • 2017-01-14
    • 1970-01-01
    相关资源
    最近更新 更多