在 R 中指定起始集群中心时出现 kmeans 错误？答案

【问题标题】：kmeans bug when specifying starting cluster centers in R?在 R 中指定起始集群中心时出现 kmeans 错误？
【发布时间】：2019-08-06 19:41:19
【问题描述】：

我试图在 R 中逐步运行 kmeans。当我设置 iter.max = 1 并指定起始聚类中心代替 k 时，算法似乎一直在运行，直到它收敛而不是指定的 1 次迭代。

谁能确认这是一个已知的错误？如果没有，我缺少什么？

这是我的参考代码：

# Set up data
data <- data.frame(names = c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2"), 
                   x = c(2, 2, 8, 5, 7, 6, 1, 4),
                   y = c(10, 5, 4, 8, 5, 4, 2, 9))

initial_centers <- matrix(c(2, 5, 1, 10, 8, 2), ncol=2)

# Run k means for 1 iteration
model <- kmeans(data[,-1], initial_centers, iter.max=1)
model$centers

# Actual Output:
#          x        y
# 1 3.666667 9.000000
# 2 7.000000 4.333333
# 3 1.500000 3.500000

# Expected Output:
#          x        y
# 1 2.000000 10.00000
# 2 6.000000 6.000000
# 3 1.500000 3.500000

【问题讨论】：

为什么你认为这是一个错误而不是 1 次迭代中的收敛？如果将iter.max设置为10，然后查看summary(model)，它仍然只运行了1次迭代就收敛了。
我现在看到了。谢谢！我的理解是该算法通过将每个点分配给最近的集群并迭代来工作。如果是这种情况，它将在第 4 次迭代中收敛。但是，似乎它比 Anony-Mousse 在下面接受的答案中澄清的更聪明。

标签： r cluster-analysis k-means

【解决方案1】：

R 中默认的 k-means 算法比您在课堂上学到的更聪明。这是 Hartigan 和 Wong 的算法。

如果您想将每个点分配给最近的预定义中心，请不要为此滥用 kmeans。相反，只需计算距离并使用argmin。

【讨论】：