k=2 的 Kmeans 算法给出相等的簇大小输出答案

【问题标题】：Kmeans algorithm for k=2 which gives equal cluster size outputsk=2 的 Kmeans 算法给出相等的簇大小输出
【发布时间】：2017-10-13 18:33:29
【问题描述】：

我正在使用修改后的劳埃德算法在 k=2 的 kmeans 中获得相等的集群大小输出。以下是伪代码：

- Randomly choose 2 points as initialization for the 2 clusters (denoted as c1, c2)
- Repeat below steps until convergence
    - Sort all points xi according to ascending values of ||xi-c1|| - ||xi-c2||, i.e. differences in distances to the first and the second cluster
    - Put top 50% points in cluster 1 , others in cluster 2
    - Recalculate centroids as average of the allocated points (as usual in Lloyd's)

现在上述算法在经验上对我来说效果很好：

它提供了平衡的集群
它总是降低目标

在文献中是否曾提出或分析过这样的算法？请给我一些参考资料吗？

【问题讨论】：

标签： algorithm machine-learning cluster-analysis k-means spherical-kmeans

【解决方案1】：

此处解释了用于 2 个以上集群的更通用版本：

https://elki-project.github.io/tutorial/same-size_k_means

我在文献中多次看到具有各种大小限制的 k-means，但我手头没有任何参考资料。我不相信这一点：强制集群具有相同的大小与寻找最小二乘最佳逼近恕我直言的 k 均值想法相矛盾，因为这意味着故意选择更差的逼近。

【讨论】：

感谢参考！在我看来，我的算法和参考中的算法有一个关键的区别：对于 k=2，点分配步骤可以完全按照上面的方法解决，而对于更一般的 k>2，它似乎不是案子。因此，在上面的链接中，他们使用的是本地点交换过程，当 k=2 时这是不必要的。我想知道 k=2 情况的证据是否存在于某处..
我不认为 k=2 的情况有什么特别的意义；因为人们通常在寻找更多的集群。我肯定在度量索引中看到过这种 k=2 的操作。