Pandas - 如果满足多个条件，则更新列答案

【问题标题】：Pandas - Updating columns if several conditions are metPandas - 如果满足多个条件，则更新列
【发布时间】：2021-04-30 07:07:48
【问题描述】：

我的目标是下面的输出。

A	B	C	D	E	F
0000	ZZZ	987	QW1	8	first three-four col and offset
0000	ZZZ	987	QW1	-8	first three-four col and offset
1111	AAA	123	AB1	1	first three-four col and offset
1111	AAA	123	CD1	-1	first three-four col and offset
2222	BBB	456	EF1	-4	first three-four col and offset
2222	BBB	456	GH1	-1	first three-four col and offset
2222	BBB	456	IL1	5	first three-four col and offset
3333	CCC	789	MN1	2	first two col and offset
3333	CCC	101	MN1	-2	first two col and offset
4444	DDD	121	UYT	6	first two col and offset
4444	DDD	131	FB1	-5	first two col and offset
4444	DDD	141	UYT	-1	first two col and offset
5555	EEE	151	CB1	3	first two col and offset
5555	EEE	161	CR1	-3	first two col and offset
6666	FFF	111	CB1	4	first or no match
7777	GGG	222	ZB1	10.5	first three-four col and small offset
7777	GGG	222	ZB1	-10	first three-four col and small offset

第一条规则）前三列必须彼此相等 - 无论第四列如何，它可以相等或不相等。每个组合必须将关联的数字 (col E) 偏移为零（可以从 2 到 X 条记录组合）。

第二条规则）前两列必须彼此相等 - 无论第四列如何，它可以相等或不相等。每个组合必须将关联的数字 (col E) 偏移为零（可以从 2 到 X 条记录组合）。

第三条规则）不匹配。

第四条规则）前三列必须彼此相等 - 无论第四列如何，它可以相等或不相等。每个组合可以有 0.5 AT MOST (col E) 和 NO 偏移为零的差异（可以从 2 到 X 记录组合）。

请看下面我的代码。

我完全清楚我没有以最有效的方式编写代码。您能否建议一种更有效的方法来实现这一目标？

for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['C'][i] == df['C'][j]) & (df['E'][i] + df['E'][j] == 0) :
            df['E'][i] = 'first three-four col and offset'
            df['E'][j] = 'first three-four col and offset'


for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['E'][i] + df['E'][j] == 0) & (df['E'][i] != 'first three-four col and offset') & (df['E'][j] != 'first three-four col and offset'):
            df['E'][i] = 'first two col and offset'
            df['E'][j] = 'first two col and offset'


for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['C'][i] == df['C'][j]) & (df['E'][i] + df['E'][j] != 0) & (df['E'][i] + df['E'][j] =< 0.5) & (df['E'][i] + df['E'][j] >= -0.5) & (df['E'][i] != 'first three-four col and offset') & (df['E'][j] != 'first three-four col and offset') & (df['E'][i] != 'first two col and offset') & (df['E'][j] != 'first two col and offset'):
            df['E'][i] = 'first three-four col and small offset'
            df['E'][j] = 'first three-four col and small offset'

有没有办法以更有效的方式获得预期的结果？

我也知道以下代码不起作用。我尝试用正确的评论更新这条记录，但徒劳无功。

for ... :
  if.... :
     df['col'][index] = 'comment'

让我们进一步假设我想以这种“效率不高的方式”保留我的代码，这似乎是有效的（最后一行代码除外）。我应该如何更改最后一行以使我的脚本正常工作？

【问题讨论】：

你检查过 pd.DataFrame.where() 了吗？ pandas.pydata.org/docs/reference/api/…。它还有助于查看原始数据。

标签： python pandas dataframe

【解决方案1】：

`groupby` + `transform` 和 `np.select`

m1 = df.groupby(['A', 'B', 'C'])['E'].transform('sum').eq(0)  # Rule 1
m2 = df.groupby(['A', 'B'])['E'].transform('sum').eq(0)  # Rule 2
m3 = df.groupby(['A', 'B', 'C'])['E'].transform('sum').abs().le(0.5)  # Rule 4

df['new'] = np.select([m1, m2, m3], ['first three-four col and offset',
                      'first two col and offset', 'first three-four col and small offset'], 'first or no match')

       A    B    C    D     E                                      F                                    new
0   0000  ZZZ  987  QW1   8.0        first three-four col and offset        first three-four col and offset
1   0000  ZZZ  987  QW1  -8.0        first three-four col and offset        first three-four col and offset
2   1111  AAA  123  AB1   1.0        first three-four col and offset        first three-four col and offset
3   1111  AAA  123  CD1  -1.0        first three-four col and offset        first three-four col and offset
4   2222  BBB  456  EF1  -4.0        first three-four col and offset        first three-four col and offset
5   2222  BBB  456  GH1  -1.0        first three-four col and offset        first three-four col and offset
6   2222  BBB  456  IL1   5.0        first three-four col and offset        first three-four col and offset
7   3333  CCC  789  MN1   2.0               first two col and offset               first two col and offset
8   3333  CCC  101  MN1  -2.0               first two col and offset               first two col and offset
9   4444  DDD  121  UYT   6.0               first two col and offset               first two col and offset
10  4444  DDD  131  FB1  -5.0               first two col and offset               first two col and offset
11  4444  DDD  141  UYT  -1.0               first two col and offset               first two col and offset
12  5555  EEE  151  CB1   3.0               first two col and offset               first two col and offset
13  5555  EEE  161  CR1  -3.0               first two col and offset               first two col and offset
14  6666  FFF  111  CB1   4.0                      first or no match                      first or no match
15  7777  GGG  222  ZB1  10.5  first three-four col and small offset  first three-four col and small offset
16  7777  GGG  222  ZB1 -10.0  first three-four col and small offset  first three-four col and small offset

【讨论】：

groupby + transform 和 np.select

`groupby` + `transform` 和 `np.select`