python数据框根据单元格选择行，单元格是列表中的列表答案

【问题标题】：python dataframe select rows based on cell which is a list from a listpython数据框根据单元格选择行，单元格是列表中的列表
【发布时间】：2019-09-12 02:51:05
【问题描述】：

我想根据一列列表选择数据框的行。我想根据单独的独立列表的交集选择一行。我希望有一种更优雅的方法来做到这一点，因为我已经花了几个小时进行研究，但我的解决方案仍然不完整。

import pandas as pd  

# initialize list of lists  
data = [['DS', 'Linked_list', 10, ['a', 'b', 'c']], 
        ['DS', 'Stack', 9, ['d', 'e', 'c']], 
        ['DS', 'Queue', 7, ['f', 'g', 'i']], 
        ['Algo', 'Greedy', 8, ['r', 's', 'c']], 
        ['Algo', 'DP', 6, ['t', 'r', 'g']], 
        ['Algo', 'BackTrack', 5, ['b', 'd', 'g']], ]  

# Create the pandas DataFrame  
df = pd.DataFrame(data, columns = ['Category', 'Name', 'Marks', 'Alpha'])  

print(df ) 

# how do I select rows from the dataframe that match multiple values?
# doing this with a single value is be easy
desired_name = ['DP', 'Greedy']
small_set = df[df['Name'].isin(desired_name)]
print(small_set)

# what I really want to do is something like
desired_alpha = ['c', 'i']
small_set = df[df['Alpha'].isin(desired_alpha)]
print(small_set)

# The only thing I've been able to figure out is below, but it's ugly
# and painful so guessing there is a better way

set_mask = df['Alpha'].apply(lambda x: list(filter(lambda y: y in x, desired_alpha))  )
set_mask = set_mask.to_frame()

# convert the non-empty arrays to True and other to False

set_mask = set_mask.mask(set_mask['Alpha'].str.len() != 0, True)
set_mask = set_mask.mask(set_mask['Alpha'].str.len() == 0, False)

# Then use the set_mask as a mask like df[set_mask] but that doesn't work since
# the values in set_mask are not boolean -- which is a different problem

【问题讨论】：

small_set = df[df['Name'].isin(desired_alpha)]?你的意思是df["Alpha"]？
你能给出一个示例输出吗？
是的，这就是我的意思，很好的捕捉，那是/是我想要工作的代码，@Grismar 在下面发布。我已经为以后阅读的人修复了上面的示例。

标签： python dataframe arraylist rows

【解决方案1】：

在我看来不那么“痛苦”：

# This replaces "what I really want to do is something like" section, entire solution
desired_alpha = ['c', 'i']
small_set = df[df['Alpha'].apply(lambda x: any([y in x for y in desired_alpha]))]
print(small_set)

解释：

any([y in x for y in desired_alpha]) 从'Alpha' 列(x) 中获取一个值，并检查desired_alpha 中的任何值是否出现在x 中。

通过将其作为函数应用到 df:df['Alpha'].apply(lambda x: any([y in x for y in desired_alpha])) 的 'Alpha' 列，您将获得一系列 bool 值 - 然后可用于选择您需要的解决方案。

写成代码长格式：

import pandas as pd

data = [['DS', 'Linked_list', 10, ['a', 'b', 'c']], 
        ['DS', 'Stack', 9, ['d', 'e', 'c']], 
        ['DS', 'Queue', 7, ['f', 'g', 'i']], 
        ['Algo', 'Greedy', 8, ['r', 's', 'c']], 
        ['Algo', 'DP', 6, ['t', 'r', 'g']], 
        ['Algo', 'BackTrack', 5, ['b', 'd', 'g']], ]  

df = pd.DataFrame(data, columns = ['Category', 'Name', 'Marks', 'Alpha'])  

desired = ['c', 'i']


def contains_desired(x):
    global desired
    return any([y in x for y in desired])


selection = df['Alpha'].apply(contains_desired)
small_set = df[selection]
print(small_set)

输出：

  Category         Name  Marks      Alpha
0       DS  Linked_list     10  [a, b, c]
1       DS        Stack      9  [d, e, c]
2       DS        Queue      7  [f, g, i]
3     Algo       Greedy      8  [r, s, c]

注意：如果您只想匹配同时具有i 和c 的选项，请将any() 更改为all()，但由于您的示例数据，我认为情况并非如此。

【讨论】：

干得好，效果很好！也感谢您的详细解释，这有助于我的理解......我学到了一些东西，这是开始新一天的好方法。