【问题标题】:k-means algorithm not workingk-means算法不起作用
【发布时间】:2017-04-10 08:14:55
【问题描述】:

我正在尝试使用 Numpy 在 Python 3 中实现 k-means 算法。我的输入数据矩阵是一个简单的 n x 2 点数据矩阵:

[[1, 2],
 [3, 4],
   ...
 [7, 13]]

由于某种原因,在迭代的每个步骤中,我的标签都不相同。每个标签都不一样。有人看到我做错了什么吗?我尝试在我的代码中添加一些 cmets,以便人们可以理解我正在执行的各个步骤。

def kmeans(X,k):

    # Initialize by choosing k random data points as centroids
    num_features = X.shape[1]
    centroids = X[np.random.randint(X.shape[0], size=k), :] # find k centroids
    iterations = 0
    old_labels, labels = [], []

    while not should_stop(old_labels, labels, iterations):
        iterations += 1

        clusters = [[] for i in range(0,k)]
        for i in range(k):
            clusters[i].append(centroids[i])

        # Label points
        old_labels = labels
        labels = []
        for point in X:
            distances = [np.linalg.norm(point-centroid) for centroid in centroids]
            max_centroid = np.argmax(distances)
            labels.append(max_centroid)
            clusters[max_centroid].append(point)

        # Compute new centroids
        centroids = np.empty(shape=(0,num_features))
        for cluster in clusters:
            avgs = sum(cluster)/len(cluster)
            centroids = np.append(centroids, [avgs], axis=0)

    return labels

def should_stop(old_labels, labels, iterations):
    count = 0
    if len(old_labels) == 0:
        return False
    for i in range(len(labels)):
        count += (old_labels[i] != labels[i])
    print(count)
    if old_labels == labels or iterations == 2000:
        return True
    return False

【问题讨论】:

    标签: python algorithm k-means


    【解决方案1】:
    max_centroid = np.argmax(distances)
    

    您想找到使距离最小的质心,而不是使距离最大化的质心。

    【讨论】:

      猜你喜欢
      • 2013-07-03
      • 2010-12-05
      • 2017-04-20
      • 2013-04-22
      • 2011-09-15
      • 2015-08-16
      • 2017-04-27
      相关资源
      最近更新 更多