【发布时间】:2019-03-27 03:39:39
【问题描述】:
我有一个数据框,想根据条件创建第三列说 col3 如果 col1 中存在 col2 值,则为“是”,否则为“否”
data = [[[('330420', 0.9322496056556702), ('76546', 0.9322003126144409)],76546],[[('330420', 0.9322496056556702), ('500826', 0.9322003126144409)],876546]]
test = pd.DataFrame(data, columns=['col1','col2'])
col1 col2
0 [(330420, 0.9322496056556702), (76546, 0.93220... 76546
1 [(330420, 0.9322496056556702), (500826, 0.9322... 876546
想要的结果:
data = [[[('330420', 0.9322496056556702), ('76546', 0.9322003126
144409)],76546, 'Yes'],[[('330420', 0.9322496056556702), ('500826', 0.9322003126144409)],876546,'No']]
test = pd.DataFrame(data, columns=['col1','col2', 'col3'])
col1 col2 col3
0 [(330420, 0.9322496056556702), (76546, 0.93220... 76546 Yes
1 [(330420, 0.9322496056556702), (500826, 0.9322... 876546 No
我的解决方案:
test['col3'] = [entry for tag in test['col2'] for entry in test['col1'] if tag in entry]
收到错误:ValueError: Length of values does not match length of index
【问题讨论】:
标签: python pandas dataframe tuples