【问题标题】:python dataframe select rows based on cell which is a list from a listpython数据框根据单元格选择行,单元格是列表中的列表
【发布时间】:2019-09-12 02:51:05
【问题描述】:

我想根据一列列表选择数据框的行。我想根据单独的独立列表的交集选择一行。我希望有一种更优雅的方法来做到这一点,因为我已经花了几个小时进行研究,但我的解决方案仍然不完整。

import pandas as pd  

# initialize list of lists  
data = [['DS', 'Linked_list', 10, ['a', 'b', 'c']], 
        ['DS', 'Stack', 9, ['d', 'e', 'c']], 
        ['DS', 'Queue', 7, ['f', 'g', 'i']], 
        ['Algo', 'Greedy', 8, ['r', 's', 'c']], 
        ['Algo', 'DP', 6, ['t', 'r', 'g']], 
        ['Algo', 'BackTrack', 5, ['b', 'd', 'g']], ]  

# Create the pandas DataFrame  
df = pd.DataFrame(data, columns = ['Category', 'Name', 'Marks', 'Alpha'])  

print(df ) 

# how do I select rows from the dataframe that match multiple values?
# doing this with a single value is be easy
desired_name = ['DP', 'Greedy']
small_set = df[df['Name'].isin(desired_name)]
print(small_set)

# what I really want to do is something like
desired_alpha = ['c', 'i']
small_set = df[df['Alpha'].isin(desired_alpha)]
print(small_set)

# The only thing I've been able to figure out is below, but it's ugly
# and painful so guessing there is a better way

set_mask = df['Alpha'].apply(lambda x: list(filter(lambda y: y in x, desired_alpha))  )
set_mask = set_mask.to_frame()

# convert the non-empty arrays to True and other to False

set_mask = set_mask.mask(set_mask['Alpha'].str.len() != 0, True)
set_mask = set_mask.mask(set_mask['Alpha'].str.len() == 0, False)

# Then use the set_mask as a mask like df[set_mask] but that doesn't work since
# the values in set_mask are not boolean -- which is a different problem

【问题讨论】:

  • small_set = df[df['Name'].isin(desired_alpha)]?你的意思是df["Alpha"]
  • 你能给出一个示例输出吗?
  • 是的,这就是我的意思,很好的捕捉,那是/是我想要工作的代码,@Grismar 在下面发布。我已经为以后阅读的人修复了上面的示例。

标签: python dataframe arraylist rows


【解决方案1】:

在我看来不那么“痛苦”:

# This replaces "what I really want to do is something like" section, entire solution
desired_alpha = ['c', 'i']
small_set = df[df['Alpha'].apply(lambda x: any([y in x for y in desired_alpha]))]
print(small_set)

解释:

any([y in x for y in desired_alpha])'Alpha' 列(x) 中获取一个值,并检查desired_alpha 中的任何值是否出现在x 中。

通过将其作为函数应用到 df:df['Alpha'].apply(lambda x: any([y in x for y in desired_alpha]))'Alpha' 列,您将获得一系列 bool 值 - 然后可用于选择您需要的解决方案。

写成代码长格式:

import pandas as pd

data = [['DS', 'Linked_list', 10, ['a', 'b', 'c']], 
        ['DS', 'Stack', 9, ['d', 'e', 'c']], 
        ['DS', 'Queue', 7, ['f', 'g', 'i']], 
        ['Algo', 'Greedy', 8, ['r', 's', 'c']], 
        ['Algo', 'DP', 6, ['t', 'r', 'g']], 
        ['Algo', 'BackTrack', 5, ['b', 'd', 'g']], ]  

df = pd.DataFrame(data, columns = ['Category', 'Name', 'Marks', 'Alpha'])  

desired = ['c', 'i']


def contains_desired(x):
    global desired
    return any([y in x for y in desired])


selection = df['Alpha'].apply(contains_desired)
small_set = df[selection]
print(small_set)

输出:

  Category         Name  Marks      Alpha
0       DS  Linked_list     10  [a, b, c]
1       DS        Stack      9  [d, e, c]
2       DS        Queue      7  [f, g, i]
3     Algo       Greedy      8  [r, s, c]

注意:如果您只想匹配同时具有ic 的选项,请将any() 更改为all(),但由于您的示例数据,我认为情况并非如此。

【讨论】:

  • 干得好,效果很好!也感谢您的详细解释,这有助于我的理解......我学到了一些东西,这是开始新一天的好方法。
猜你喜欢
  • 1970-01-01
  • 2021-03-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多