找到每个类别的唯一价值答案

【问题标题】：Find unique value for each category找到每个类别的唯一价值
【发布时间】：2021-09-29 06:38:21
【问题描述】：

我有两列：

category     names
vegetables   [broccoli, ginger]
fruit        [apple, grapes, dragonfruit]
vegetables   [pine]
vegetables   [bottleguord, pumpkin]
fruit        [mango, guava]

我需要找到每个类别包含的唯一值。这就是你如何创建一个新的df

import numpy as np
import pandas as pd    
df = pd.DataFrame({'category':['vegetables', 'fruit', 'vegetables', 'vegetables', 'fruit'],
                       'Names':['[broccoli, ginger]','[apple, grapes, dragonfruit]','[pine]','[bottleguord, pumpkin]', '[mango, guava]']})

这就是我尝试的方式。

g = df.groupby('category')['names'].apply(lambda x: list(np.unique(x)))

预期输出：

index = ['Vegetables' ,'Fruits']
new_df = pd.DataFrame(index=index)

这就是我调整代码的方式：print(df.assign(Names=df['Names'].str[1:-1].str.split(', ')).explode('Names').groupby('Category')['Names'].apply(lambda x: len(set(x))))

catgeory    len_unique_val
Vegetables   5
Fruits       4

【问题讨论】：

请分享预期的输出格式。
@Abdul 我添加了预期的输出。我面临关键字错误

标签： python python-3.x dataframe data-analysis

【解决方案1】：

您可以将explode、groupby 和apply 转换为python 集。

假设列表作为输入：

(df.explode('names')
   .groupby('category')
   ['names']
   .apply(set)
)

输出：

category
fruit             {dragonfruit, guava, grapes, apple, mango}
vegetables    {ginger, pine, broccoli, bottleguord, pumpkin}

假设字符串作为输入：

(df.assign(Names=df['Names'].str[1:-1].str.split(', '))
   .explode('Names')
   .groupby('category')
   ['Names']
   .apply(lambda x: '['+', '.join(set(x))+']')
)

输出：

category
fruit             [dragonfruit, guava, grapes, apple, mango]
vegetables    [ginger, pine, broccoli, bottleguord, pumpkin]

【讨论】：

@chixy 我看到你的输入可能是字符串，我更新了答案