ggplot中的Kmean聚类答案

【问题标题】：Kmean clustering in ggplotggplot中的Kmean聚类
【发布时间】：2020-03-23 09:16:11
【问题描述】：

我正在使用 K-mean alg。在R 中以分隔变量。我想在我能够管理的ggplot女巫中绘制结果，然而，ggplot 和 cluster::clusplot 的结果似乎不同

所以我想问一下我缺少什么：例如，我知道缩放比例不同，但我想知道当使用 clustplot 时所有变量都在边界内，而使用 ggplot 时却不是。

仅仅是因为缩放吗？

那么下面两个结果完全一样吗？

library(cluster)
library(ggfortify)


x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
           matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)

A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)
autoplot(kmeans(x, centers = 3, nstart = 50, iter.max = 500), data = x, frame.type = 'norm')

【问题讨论】：

标签： r ggplot2 k-means ggfortify

【解决方案1】：

对我来说，我使用clusplot 或ggplot 得到相同的情节。但是对于使用ggplot，您必须首先在您的数据上创建一个PCA，以便获得与clustplot 相同的图。也许这就是你有问题的地方。

在这里，以你的例子，我做到了：

x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
           matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)

A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)

pca_x = princomp(x)
x_cluster = data.frame(pca_x$scores,A$cluster)
ggplot(test, aes(x = Comp.1, y = Comp.2, color = as.factor(A.cluster), fill = as.factor(A.cluster))) + geom_point() + 
  stat_ellipse(type = "t",geom = "polygon",alpha = 0.4)

使用 clusplot 的情节

还有一个使用ggplot的：

希望它可以帮助您找出不同情节的原因

【讨论】：

是否还有一种方法可以在 ggplot 中查看每个集群周围的“气泡”？
您可以使用stat_ellipse 来完成。你可以在这里阅读更多相关信息：ggplot2.tidyverse.org/reference/stat_ellipse.html。我相应地编辑了我的答案