【发布时间】:2020-05-13 04:17:55
【问题描述】:
如何在应用一种热编码后聚合结果? 以下是我的示例数据
df= pd.DataFrame([
['apple','sweet'],
['apple','affordable'],
['apple','fruit'],
['orange','fruit'],
['orange','soup'],
['orange','cheap'],
['orange','sweet'],
['soda','sweet'],
['soda','cheap'],
['soda','softdrinks']
])
df= df.rename(columns={0: "productName", 1: "itemFeatures"})
我试过
df_ohe = pd.get_dummies(df['itemFeatures'])
df_ohe_merged = pd.concat([df, df_ohe],axis='columns')
df_final = df_ohe_merged.drop(['itemFeatures'],axis='columns')
如何获得如下所需的输出? 还是有更好的方法?
desired_output = pd.DataFrame([
['apple',1,0,0,1,0,0,1],
['orange',0,1,0,1,0,1,1],
['soda',0,0,1,0,1,0,1]
])
desired_output = desired_output.rename(columns={0: "productName",
1: "affordable",
2: "cheap",
3: "famous",
4: "fruit",
5: "softdrinks",
6: "sour",
7: "sweet",
})
非常感谢
【问题讨论】:
标签: python pandas dataframe cosine-similarity one-hot-encoding