【问题标题】:Capture all unique information by group按组捕获所有唯一信息
【发布时间】:2022-11-23 19:21:15
【问题描述】:

我想创建一个独特的水果数据集。我不知道每种水果下可能包含的所有类型(例如颜色商店、价格)。对于每种类型,也可能有重复的行。有没有办法以完全通用的方式检测所有可能的重复项并捕获所有独特的信息?

   type    val       detail
0 fruit    apple
1 colour   green     greenish
2 colour   yellow    
3 store    walmart    usa
4 price    10
5 NaN
6 fruit    banana
7 colour   yellow
8 fruit    pear
9 fruit    jackfruit
...

预期产出

   fruit      colour            store    price       detail           ...
0  apple     [green, yellow ]  [walmart]  [10]      [greenish, usa] 
1  banana     [yellow]           NaN      NaN
2  pear        NaN               NaN      NaN    
3  jackfruit   NaN               NaN      NaN    

我试过了。但这并没有接近预期的输出。它也不显示列名。

df.groupby("type")["val"].agg(size=len, set=lambda x: set(x))
0 fruit   {"apple",...}
1 colour  ...

【问题讨论】:

    标签: python python-3.x pandas


    【解决方案1】:

    利用:

    m = df['type'].eq('fruit')
    
    df['fruit'] = df['val'].where(m).ffill()
    
    df1 = (df.pivot_table(index='fruit',columns='type', 
                          aggfunc=lambda x: list(dict.fromkeys(x.dropna())))
            .drop('fruit', axis=1, level=1))
    df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')
    print (df1)
              detail_colour detail_price detail_store       val_colour val_price  
    fruit                                                                          
    apple        [greenish]           []        [usa]  [green, yellow]      [10]   
    banana               []          NaN          NaN         [yellow]       NaN   
    jackfruit           NaN          NaN          NaN              NaN       NaN   
    pear                NaN          NaN          NaN              NaN       NaN   
    
               val_store  
    fruit                 
    apple      [walmart]  
    banana           NaN  
    jackfruit        NaN  
    pear             NaN  
    

    【讨论】:

      猜你喜欢
      • 2019-06-29
      • 1970-01-01
      • 1970-01-01
      • 2011-01-10
      • 2020-12-08
      • 1970-01-01
      • 2012-07-02
      • 2012-08-24
      • 1970-01-01
      相关资源
      最近更新 更多