【问题标题】:How to assign minimum value based on lookup values in two other columns in pandas?如何根据熊猫中其他两列中的查找值分配最小值?
【发布时间】:2018-02-14 14:08:39
【问题描述】:

目标:希望以编程方式匹配两列中的组合以找到另一列的最小值

假设我有这个:

import pandas as pd

d = {'Part_1': [91, 201, 201],
     'Part_2': [201,111,91], 
     'Result': [3,3, 3], 
     'Sub-Score': [0.60, 0.8,0.9], 
     'Final-Score': [0,0,0]}
df = pd.DataFrame(data=d)
df

我想从可以分配给 Final-Score 列的子分数列中找到最小值。我需要根据匹配的 Part_1 和 Part_2 进行选择(这两个部分的位置可以不同):

d_new = {'Part_1': [91, 201, 201],
         'Part_2': [201,111,91], 
         'Result': [3,3, 3], 
         'Sub-Score': [0.60, 0.8,0.9], 
         'Final-Score': [0.6,.8,.6]}
df_new = pd.DataFrame(data=d_new)
df_new

在这里我们可以看到 row 0row 2Part_1 列中具有 相同的值 strong>Part_2,它们完全是乱序的。此外,我们可以看到 row 0 的 Sub-Score 值为 0.60,row 2 的 Sub-Score 值为 0.9。

我希望从 row 0 分配 Sub-Score 值(因为它是 row 0row 2 中的最低值)到第 0 行和 第 2 行 的最终得分列。 Row 1 没有可比性,也没有与 row 0row 2 相同的部分,因此我们将其 Sub-score 结转值改为 Final-Score 值。

任何帮助将不胜感激。

(已编辑):

输入:

   Final-Score  Part_1  Part_2  Result  Sub-Score
0            0      91     201       3        0.6
1            0     201     111       3        0.8
2            0     201      91       3        0.9

期望的输出:

   Final-Score  Part_1  Part_2  Result  Sub-Score
0          0.6      91     201       3        0.6
1          0.8     201     111       3        0.8
2          0.6     201      91       3        0.9

【问题讨论】:

    标签: python python-3.x pandas dataframe


    【解决方案1】:

    对值进行排序,然后根据 ngroup 分组并转换最小值,即

    temp = pd.DataFrame(pd.np.sort(df[['Part_1','Part_2']]))
    grps = temp.groupby(temp.columns.tolist()).ngroup()
    
    df['new']=df.groupby(grps)['Sub-Score'].transform('min')
    
       Final-Score  Part_1  Part_2  Result  Sub-Score  new
    0            0      91     201       3        0.6  0.6
    1            0     201     111       3        0.8  0.8
    2            0     201      91       3        0.9  0.6
    

    【讨论】:

      【解决方案2】:

      我发现了一种(有点老套)似乎可行的方法:

      import pandas as pd
      
      d = {'Part_1': [91, 201, 201],
           'Part_2': [201, 111, 91],
           'Result': [3, 3, 3],
           'Sub-Score': [0.60, 0.8, 0.9],
           'Final-Score': [0, 0, 0]}
      df = pd.DataFrame(data=d)
      
      # Find lowest part-number of part-pair and add as new column
      df["min_part"] = df[["Part_1", "Part_2"]].min(axis=1)
      # Find highest part-number of part-pair and add as new column
      df["max_part"] = df[["Part_1", "Part_2"]].max(axis=1)
      print df
      

      数据集现在看起来像:

         Final-Score  Part_1  Part_2  Result  Sub-Score  min_part  max_part
      0            0      91     201       3        0.6        91       201
      1            0     201     111       3        0.8       111       201
      2            0     201      91       3        0.9        91       201
      

      然后做:

      # Group together rows with the same min_part, max_part pair, and assign
      # their lowest "Sub-Score" value to the "Final-score" column
      df["Final-Score"] = df.groupby(["min_part", "max_part"])["Sub-Score"].transform("min")
      print df
      

      最终结果:

         Final-Score  Part_1  Part_2  Result  Sub-Score  min_part  max_part
      0          0.6      91     201       3        0.6        91       201
      1          0.8     201     111       3        0.8       111       201
      2          0.6     201      91       3        0.9        91       201
      

      (可选)只保留原始列:

      df = df[["Final-Score", "Part_1", "Part_2", "Result", "Sub-Score"]]
      print df
      

      结果:

         Final-Score  Part_1  Part_2  Result  Sub-Score
      0          0.6      91     201       3        0.6
      1          0.8     201     111       3        0.8
      2          0.6     201      91       3        0.9
      

      【讨论】:

        【解决方案3】:

        我也会检查一个临时表。首先生成一个密钥,然后按该密钥分组并应用 min():

        # Generate a key that does not depend 
        # on the order of the values in Part_1 and Part_2
        df['key'] = [str(set(i)) for i in list(df[['Part_1', 'Part_2']].values)]
        
        # Generate temporary table that contains keys and minimal values
        tmp = df.groupby('key').min()['Sub-Score']
        
        scores = {}    
        for key, val in zip(tmp.index, tmp.values):
            scores[key] = val
        
        # Place the minimal values in the original table
        df.loc[:, 'Final-Score'] = [scores[key] for key in df.key]
        
        # Finally, delete what you don't need
        del df['key'], tmp
        
        df
        >   Final-Score  Part_1  Part_2  Result  Sub-Score
        >0          0.6      91     201       3        0.6
        >1          0.8     201     111       3        0.8
        >2          0.6     201      91       3        0.9
        

        【讨论】:

          猜你喜欢
          • 2020-08-09
          • 2018-04-26
          • 2020-07-12
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2018-10-15
          • 1970-01-01
          • 2013-08-30
          相关资源
          最近更新 更多