在以值为条件的新熊猫列中加入列名答案

【问题标题】：join column names in a new pandas columns conditional on value在以值为条件的新熊猫列中加入列名
【发布时间】：2020-09-09 08:28:57
【问题描述】：

我有以下数据集：

data = {'Environment': ['0', '0', '0'],
        'Health': ['1', '0', '1'],
            'Labor': ['1', '1', '1'],
             }

df = pd.DataFrame(data, columns=['Environment', 'Health', 'Labor'])

我想创建一个新列 df['Keyword']，其值是值 > 0 的列名的连接。

预期结果：

data = {'Environment': ['0', '0', '0'],
            'Health': ['1', '0', '1'],
                'Labor': ['1', '1', '1'],
                     'Keyword': ['Health, Labor', 'Labor', 'Health, Labor']}
    
df_test = pd.DataFrame(data, columns=['Environment', 'Health', 'Labor', 'Keyword']) 
df_test
df = pd.DataFrame(data, columns=['Environment', 'Health', 'Labor'])

我该怎么做？

【问题讨论】：

标签： python pandas join

【解决方案1】：

其他带有.apply()的版本：

df['Keyword'] = df.apply(lambda x: ', '.join(b for a, b in zip(x, x.index) if a=='1'),axis=1)
print(df)

打印：

  Environment Health Labor        Keyword
0           0      1     1  Health, Labor
1           0      0     1          Labor
2           0      1     1  Health, Labor

【讨论】：

【解决方案2】：

使用mask 和stack 的另一种方法，然后使用 groupby 来聚合项目。

stack 默认会丢弃 na 值。

df['keyword'] = df.mask(
               df.lt(1)).stack().reset_index(1)\
                        .groupby(level=0)["level_1"].agg(list)

print(df)

   Environment  Health  Labor          keyword
0            0       1      1  [Health, Labor]
1            0       0      1          [Labor]
2            0       1      1  [Health, Labor]

【讨论】：

【解决方案3】：

示例数据值中的第一个问题是字符串，所以如果想比较更多用途：

df = df.astype(float).astype(int)

或者：

 df = df.replace({'0':0, '1':1})

然后使用DataFrame.dot 与列名和分隔符进行矩阵乘法，最后将其从右侧删除：

df['Keyword'] = df.gt(0).dot(df.columns + ', ').str.rstrip(', ')
print (df)
   Environment  Health  Labor        Keyword
0            0       1      1  Health, Labor
1            0       0      1          Labor
2            0       1      1  Health, Labor

或者比较字符串 - 例如不等于'0' 或等于'1'：

df['Keyword'] = df.ne('0').dot(df.columns + ', ').str.rstrip(', ')

df['Keyword'] = df.eq('1').dot(df.columns + ', ').str.rstrip(', ')

【讨论】：