【问题标题】:Pandas - Updating columns if several conditions are metPandas - 如果满足多个条件,则更新列
【发布时间】:2021-04-30 07:07:48
【问题描述】:

我的目标是下面的输出。

A B C D E F
0000 ZZZ 987 QW1 8 first three-four col and offset
0000 ZZZ 987 QW1 -8 first three-four col and offset
1111 AAA 123 AB1 1 first three-four col and offset
1111 AAA 123 CD1 -1 first three-four col and offset
2222 BBB 456 EF1 -4 first three-four col and offset
2222 BBB 456 GH1 -1 first three-four col and offset
2222 BBB 456 IL1 5 first three-four col and offset
3333 CCC 789 MN1 2 first two col and offset
3333 CCC 101 MN1 -2 first two col and offset
4444 DDD 121 UYT 6 first two col and offset
4444 DDD 131 FB1 -5 first two col and offset
4444 DDD 141 UYT -1 first two col and offset
5555 EEE 151 CB1 3 first two col and offset
5555 EEE 161 CR1 -3 first two col and offset
6666 FFF 111 CB1 4 first or no match
7777 GGG 222 ZB1 10.5 first three-four col and small offset
7777 GGG 222 ZB1 -10 first three-four col and small offset

第一条规则)前三列必须彼此相等 - 无论第四列如何,它可以相等或不相等。每个组合必须将关联的数字 (col E) 偏移为零(可以从 2 到 X 条记录组合)。

第二条规则)前两列必须彼此相等 - 无论第四列如何,它可以相等或不相等。每个组合必须将关联的数字 (col E) 偏移为零(可以从 2 到 X 条记录组合)。

第三条规则)不匹配。

第四条规则)前三列必须彼此相等 - 无论第四列如何,它可以相等或不相等。每个组合可以有 0.5 AT MOST (col E) 和 NO 偏移为零的差异(可以从 2 到 X 记录组合)。

请看下面我的代码。

我完全清楚我没有以最有效的方式编写代码。您能否建议一种更有效的方法来实现这一目标?

for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['C'][i] == df['C'][j]) & (df['E'][i] + df['E'][j] == 0) :
            df['E'][i] = 'first three-four col and offset'
            df['E'][j] = 'first three-four col and offset'


for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['E'][i] + df['E'][j] == 0) & (df['E'][i] != 'first three-four col and offset') & (df['E'][j] != 'first three-four col and offset'):
            df['E'][i] = 'first two col and offset'
            df['E'][j] = 'first two col and offset'


for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['C'][i] == df['C'][j]) & (df['E'][i] + df['E'][j] != 0) & (df['E'][i] + df['E'][j] =< 0.5) & (df['E'][i] + df['E'][j] >= -0.5) & (df['E'][i] != 'first three-four col and offset') & (df['E'][j] != 'first three-four col and offset') & (df['E'][i] != 'first two col and offset') & (df['E'][j] != 'first two col and offset'):
            df['E'][i] = 'first three-four col and small offset'
            df['E'][j] = 'first three-four col and small offset'

有没有办法以更有效的方式获得预期的结果?

我也知道以下代码不起作用。我尝试用正确的评论更新这条记录,但徒劳无功。

for ... :
  if.... :
     df['col'][index] = 'comment'

让我们进一步假设我想以这种“效率不高的方式”保留我的代码,这似乎是有效的(最后一行代码除外)。我应该如何更改最后一行以使我的脚本正常工作?

【问题讨论】:

标签: python pandas dataframe


【解决方案1】:

groupby + transformnp.select

m1 = df.groupby(['A', 'B', 'C'])['E'].transform('sum').eq(0)  # Rule 1
m2 = df.groupby(['A', 'B'])['E'].transform('sum').eq(0)  # Rule 2
m3 = df.groupby(['A', 'B', 'C'])['E'].transform('sum').abs().le(0.5)  # Rule 4

df['new'] = np.select([m1, m2, m3], ['first three-four col and offset',
                      'first two col and offset', 'first three-four col and small offset'], 'first or no match')

       A    B    C    D     E                                      F                                    new
0   0000  ZZZ  987  QW1   8.0        first three-four col and offset        first three-four col and offset
1   0000  ZZZ  987  QW1  -8.0        first three-four col and offset        first three-four col and offset
2   1111  AAA  123  AB1   1.0        first three-four col and offset        first three-four col and offset
3   1111  AAA  123  CD1  -1.0        first three-four col and offset        first three-four col and offset
4   2222  BBB  456  EF1  -4.0        first three-four col and offset        first three-four col and offset
5   2222  BBB  456  GH1  -1.0        first three-four col and offset        first three-four col and offset
6   2222  BBB  456  IL1   5.0        first three-four col and offset        first three-four col and offset
7   3333  CCC  789  MN1   2.0               first two col and offset               first two col and offset
8   3333  CCC  101  MN1  -2.0               first two col and offset               first two col and offset
9   4444  DDD  121  UYT   6.0               first two col and offset               first two col and offset
10  4444  DDD  131  FB1  -5.0               first two col and offset               first two col and offset
11  4444  DDD  141  UYT  -1.0               first two col and offset               first two col and offset
12  5555  EEE  151  CB1   3.0               first two col and offset               first two col and offset
13  5555  EEE  161  CR1  -3.0               first two col and offset               first two col and offset
14  6666  FFF  111  CB1   4.0                      first or no match                      first or no match
15  7777  GGG  222  ZB1  10.5  first three-four col and small offset  first three-four col and small offset
16  7777  GGG  222  ZB1 -10.0  first three-four col and small offset  first three-four col and small offset

【讨论】:

    猜你喜欢
    • 2014-02-11
    • 1970-01-01
    • 2019-04-06
    • 1970-01-01
    • 1970-01-01
    • 2022-01-13
    • 1970-01-01
    • 1970-01-01
    • 2023-02-24
    相关资源
    最近更新 更多