使用 Numpy 进行 Kmean 聚类答案

【问题标题】：using Numpy for Kmean Clustering使用 Numpy 进行 Kmean 聚类
【发布时间】：2021-11-29 03:08:04
【问题描述】：

我是机器学习的新手，想构建一个 k = 2 的 Kmean 算法，但我正在努力计算新的质心。这是我的 kmeans 代码：

def euclidean_distance(x: np.ndarray, y: np.ndarray):
   # x shape: (N1, D)
   # y shape: (N2, D)
   # output shape: (N1, N2)
    dist = []
    for i in x:
       for j in y:
        new_list = np.sqrt(sum((i - j) ** 2))
        dist.append(new_list)
    distance = np.reshape(dist, (len(x), len(y)))
    return distance

def kmeans(x, centroids, iterations=30):
    assignment = None
    for i in iterations:
        dist = euclidean_distance(x, centroids)
        assignment = np.argmin(dist, axis=1)

    for c in range(len(y)):
        centroids[c] = np.mean(x[assignment == c], 0) #error here
    
        return centroids, assignment

我输入了 x = [[1., 0.], [0., 1.], [0.5, 0.5]] 和 y = [[1., 0.], [0., 1.]] 和 distance 是一个数组，看起来像这样：

[[0.         1.41421356]
[1.41421356 0.         ]
[0.70710678 0.70710678]]

当我运行kmeans(x,y) 时，它会返回错误：

----------------------------------- ---------------------------- TypeError Traceback（最近一次调用最后）/tmp/ipykernel_40086/2170434798.py 在 5 6 for c in range(len(y)):

----> 7 个质心[c] = (x[classes == c], 0) 8 个打印（质心）

TypeError：只有整数标量数组可以转换为标量索引

有人知道如何修复它或改进我的代码吗？提前谢谢！

【问题讨论】：

请发布完整的回溯，并指定发生错误的行
centroids[c] = np.mean(x[assignment == c], 0) 行发生错误（就像我在代码中注释一样）
我同意 Arav R 在下面的回答。赞成。

标签： python numpy k-means

【解决方案1】：

更改 NumPy 数组的输入应该可以消除错误：

x = np.array([[1., 0.], [0., 1.], [0.5, 0.5]])
y = np.array([[1., 0.], [0., 1.]])

您似乎还必须在kmeans 函数中将for i in iterations 更改为for i in range(iterations)。

【讨论】：

哦，谢谢。这是另一个错误。但我已经修复了主要错误。