【问题标题】:pandas: drop duplicate rows while keeping dummy variables valuespandas:删除重复行,同时保留虚拟变量值
【发布时间】:2019-12-03 20:25:26
【问题描述】:

我有以下数据框示例:

child_id   feature_1   feature_2   feature_3   feature_4   feature_5
   10          1           0           0          0            0
   10          0           0           1          0            0
   10          0           1           0          0            0
   10          0           0           0          1            0
   20          0           0           0          0            1
   20          1           0           0          0            0
   20          0           1           1          0            0
   20          0           0           0          0            0

但是,我想要这个堆叠的数据框,所以子 ID 不会重复多次:

child_id   feature_1   feature_2   feature_3   feature_4   feature_5
   10          1           1           1           1           0
   20          1           1           1           0           1

由于每一行都不同,我不能简单地删除重复项。有任何想法吗?非常感谢!

【问题讨论】:

  • df.groupby('child_id').sum()
  • ^ 可以添加.clip(upper=1),如果需要确保他们是假人或.any().astype(int)

标签: pandas dataframe stack pivot-table


【解决方案1】:
child_id  = [10,10,10,10,20,20,20,20]  
feature_1 = [1,0,0,0,0,1,0,0]  
feature_2 = [0,0,1,0,0,0,1,0]
feature_3 = [0,1,0,0,0,0,1,1]  
feature_4 = [0,0,0,1,0,0,0,0]
feature_5 = [0,0,0,0,1,0,0,0]

import pandas as pd
df = pd.DataFrame(zip(child_id,feature_1,feature_2,feature_3,feature_4,feature_5),columns=['A','B','C','D','E','F'])
df

df.groupby('A').max()

 #10       1    1   1   1   0
 #20       1    1   1   0   1

【讨论】:

    猜你喜欢
    • 2019-06-10
    • 2019-10-27
    • 2014-06-23
    • 1970-01-01
    • 1970-01-01
    • 2019-12-17
    • 1970-01-01
    • 2019-04-06
    • 1970-01-01
    相关资源
    最近更新 更多