在扁平嵌套列表的列表推导中使用自定义 lambda 函数通过 , 拆分值,转换为集合并最后转换为列表:
target_table = pd.DataFrame({'user_id':[1,2,1,2,1,2],
'target_type':[2,8,2,8,8,8],
'constraints':['aaa, dd','ss, op','ja, ss',
'dd, su, per', 'a', 'uu, ss']})
f = lambda x: list(set(["'" + z + "'" for y in x.str.split(', ') for z in y]))
grouped_targets = (target_table.groupby(['user_id', 'target_type'])['constraints']
.apply(f)
.reset_index())
print (grouped_targets['constraints'].tolist())
[["'ss'", "'aaa'", "'dd'", "'ja'"], ["'a'"],
["'ss'", "'per'", "'uu'", "'su'", "'op'", "'dd'"]]
f = lambda x: list(set([z for y in x.str.split(', ') for z in y]))
grouped_targets = (target_table.groupby(['user_id', 'target_type'])['constraints']
.apply(f)
.reset_index())
print (grouped_targets['constraints'].tolist())
[['ss', 'dd', 'aaa', 'ja'], ['a'],
['ss', 'su', 'uu', 'per', 'op', 'dd']]
编辑:
我认为最复杂的是自定义函数,你可以在列表中测试它是如何工作的:
L = ['aaa, dd','ss, op','ja, ss', 'dd, su, per', 'a', 'uu, ss']
如果列表输出中只有拆分值不同,则获取列表列表(嵌套列表):
a = [x.split(', ') for x in L]
print (a)
[['aaa', 'dd'], ['ss', 'op'], ['ja', 'ss'], ['dd', 'su', 'per'], ['a'], ['uu', 'ss']]
flatten values 可以与 split 结合使用:
a = [z for x in L for z in x.split(', ')]
print (a)
['aaa', 'dd', 'ss', 'op', 'ja', 'ss', 'dd', 'su', 'per', 'a', 'uu', 'ss']