【发布时间】:2020-09-23 01:00:51
【问题描述】:
如何绘制以下数据的 K 均值
no,store_id,revenue,profit,state,country
0,101,779183,281257,WD,India
1,101,144829,838451,WD,India
2,101,766465,757565,AL,Japan
我的代码在下面
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
df1 = pd.get_dummies(df, columns=['state','country'])
clusters = 2
km = KMeans(n_clusters=8).fit(df1)
labels = km.predict(df1)
df1['cluster_id'] = km.labels_
def distance_to_centroid(row, centroid):
row = row[['no','store_id','revenue','profit','state','country']]
return euclidean(row, centroid)
df1['distance_to_center0'] = df1.apply(lambda r: distance_to_centroid(r,
km.cluster_centers_[0]),1)
df1['distance_to_center1'] = df1.apply(lambda r: distance_to_centroid(r,
km.cluster_centers_[1]),1)
dummies_df =dummies[['distance_to_center0','distance_to_center1','cluster_id']]
test = {0:"Blue", 1:"Red", 2:"Green",3:"Black",4:"Orange",5:"Yellow",6:"Violet",7:"Grey"}
sns.scatterplot(x="distance_to_center0", y="distance_to_center1", data=dummies_df, hue="cluster_id", palette = test)
下面是找到中心点的代码
km = KMeans(n_clusters=7).fit(dummies)
closest, _ = pairwise_distances_argmin_min(km.cluster_centers_, dummies)
closest
如何为集群绘制散点图
如何让打印点远离集群
至少异常值方法 -1 是异常值(scikit learn)。kmeans.labes_ 仅打印 1 和 0 ,如何获取异常值
【问题讨论】:
-
您在哪些变量上运行 KMeans?但通常你可以使用
plt.scatter(x, y)。 让打印点远离集群是什么意思? -
我做了一个非常相似的项目。你可以看看我是如何在第 166 到 171 行绘制数据的。github.com/moe-assal/Machine-Learning/blob/master/…
-
@moeassal 我不需要预测任何东西,我只想绘制一个图表并找到远离集群的点
-
@moeaassal 你只考虑两个变量
-
对不起,你想做什么? oyu 想在你的情节中使用多少变量?
标签: python scikit-learn data-science k-means