在 weka 中确定 k-means 算法中的最佳“k”答案

【问题标题】：Decide best 'k' in k-means algorithm in weka在 weka 中确定 k-means 算法中的最佳“k”
【发布时间】：2016-07-15 08:53:53
【问题描述】：

我正在使用 k-means 算法进行聚类，但我不确定如何根据结果确定最佳 k 值。例如，我在 k=10 的数据集上应用了 k-means：

kMeans
======

Number of iterations: 16
Within cluster sum of squared errors: 38.47923197081721
Missing values globally replaced with mean/mode

Cluster centroids:
                                                         Cluster#
Attribute                          Full Data                    0                    1                    2                    3                    4                    5                    6                    7                    8                    9
                                       (214)                 (16)                  (9)                 (13)                 (23)                 (46)                 (12)                 (11)                 (40)                 (15)                 (29)
==============================================================================================================================================================================================================================================================
RI                                    1.5184               1.5181               1.5175               1.5189               1.5178               1.5172                1.519               1.5255               1.5175               1.5222               1.5171
Na                                   13.4079              12.9988              14.6467              12.8277              13.2148              13.1896                13.63              12.6318              13.0518              13.9107              14.4421
Mg                                    2.6845               3.4894               1.3056               0.7738               3.4261               3.4987               3.4917               0.2145               3.4958               3.8273               0.5383
Al                                    1.4449               1.1844               1.3667               2.0338               1.3552               1.4898               1.3308               1.1891               1.2617                0.716               2.1228
Si                                   72.6509               72.785              73.2067              72.3662              72.6526              72.6989                72.07              72.0709              72.9532              71.7467              72.9659
K                                     0.4971               0.4794                    0                 1.47                0.527                 0.59               0.4108               0.2345                0.547               0.1007               0.3252
Ca                                     8.957               8.8069               9.3567              10.1238               8.5648               8.3041                 8.87              13.1291               8.5035               9.5887               8.4914
Ba                                     0.175                0.015                    0               0.1877                0.023                0.003               0.0667               0.2864                    0                    0                 1.04
Fe                                     0.057               0.2238                    0               0.0608               0.2013               0.0104               0.0167               0.1109                0.011               0.0313               0.0134
Type                    build wind non-float     build wind float            tableware           containers build wind non-float build wind non-float     build wind float build wind non-float     build wind float     build wind float            headlamps

【问题讨论】：

标签： cluster-analysis weka k-means

【解决方案1】：

在 k-means 算法 Thumb-Rule、肘部方法、轮廓法等中有多种方法可以确定“k”的最佳值。在我的工作中，我曾经遵循肘部方法获得的结果，并获得了成功我的结果，我已经用 R 语言完成了所有的分析。这是这些方法的描述链接link 尝试找到给定链接的子链接，为任何一种方法构建代码并应用于您的数据。

希望对你有帮助，如果不是很抱歉。

祝你工作顺利。

【讨论】：