检查一列是否包含来自其他列的值并填充第三列（真或假）答案

【问题标题】：Check if a column contains value from other column and fill third column (True or False)检查一列是否包含来自其他列的值并填充第三列（真或假）
【发布时间】：2019-09-16 19:20:04
【问题描述】：

我想检查一列是否包含来自其他列的值，并用 True 或 False 填充第三列。

df 输入：

id | name  | account
-------------------
01 | John  | AB01
02 | Emma  | AB03
03 | Alice | AB03

df 输出：

id | name  | account | match
----------------------------
01 | John  | AB01    | True
02 | Emma  | AB03    | False
03 | Alice | AB03    | True

我试过了：

df['match'] = np.where(df['account'].contains(df['id']), 'True','False')

错误：AttributeError：“系列”对象没有“包含”属性

df['match'] = np.where(df['account'].str.contains(df['id']), 'True','False')

错误：TypeError：“系列”对象是可变的，因此它们不能被散列

非常感谢任何帮助！

【问题讨论】：

str.contains() 采用字符串变量而不是系列。我知道的方式是你可以使用 for 循环来循环你拥有的所有 id...但我很确定还有其他更好的方法来做到这一点

标签： python python-3.x pandas

【解决方案1】：

要测试每行是否包含值，请使用 apply 和 in

对于布尔值True, False：

df['match'] =  df.apply(lambda x: x['id'] in x['account'], axis=1)

对于字符串'True', 'False'：

df['match'] =  np.where(df.apply(lambda x: x['id'] in x['account'], axis=1), 'True','False')


print (df)
   id   name account  match
0  01   John    AB01   True
1  02   Emma    AB03  False
2  03  Alice    AB03   True

编辑：

存在缺失值，因此可能的解决方案是使用np.nan == np.nan 是False，因此添加了if-else 声明：

print (df)
   id   name account
0  01   John    AB01
1  02   Emma     NaN
2  03  Alice    AB03

对于布尔值True, False：

df['match'] = df.apply(lambda x: x['id'] in x['account'] 
                                 if x['account'] == x['account'] 
                                 else False, axis=1)

对于字符串'True', 'False'：

df['match'] = np.where(df.apply(lambda x: x['id'] in x['account'] 
                                          if x['account'] == x['account'] 
                                          else False, axis=1), 'True','False')
print (df)
   id   name account  match
0  01   John    AB01   True
1  02   Emma     NaN  False
2  03  Alice    AB03   True

另一个想法是使用带有try-exception 语句的自定义函数：

def test(x):
    try:
        return x['id'] in x['account']
    except Exception:
        return False

对于布尔值True, False：

df['match'] = df.apply(test, axis=1)

对于字符串'True', 'False'：

df['match'] = np.where(df.apply(test, axis=1), 'True','False')

【讨论】：

我的数据中有一个错误：TypeError: ("argument of type 'float' is not iterable", 'occurred at index 776')。我是这样改的。不确定它是否正确np.where(df.apply(lambda x: str(x['id']) in str(x['account']), axis=1), 'True','False')。是否可以搜索1,2 & 3 而不是01, 02, 03
您应该将此NaN 测试添加到您的this answer，以便您下次可以将其标记为欺骗

【解决方案2】：

这样的？

df['match'] = df.apply(lambda x: x.account.str.contains(str(x.id)), axis=1)

【讨论】：

我的错，是x.id 而不是df.id！