【发布时间】:2017-11-30 16:32:24
【问题描述】:
我正在尝试使用 lat/lon 作为 X/Y 轴和 DaysUntilDueDate 作为我的 Z 轴对数据进行聚类。我还想保留索引列 ('PM'),以便稍后使用此聚类分析创建计划。我发现here 的教程非常棒,但我不知道它是否考虑了 Z 轴,而且我四处寻找除了错误之外没有任何结果。我认为代码中的重点是这一行的iloc位的参数:
kmeans_model = KMeans(n_clusters=k, random_state=1).fit(A.iloc[:, :])
我尝试将此部分更改为 iloc[1:4](仅适用于第 1-3 列),但这导致了以下错误:
ValueError: n_samples=3 should be >= n_clusters=4
所以我的问题是:如何设置我的代码以在 3 维上运行聚类分析,同时保留索引 ('PM') 列?
这是我的python文件,感谢您的帮助:
from sklearn.cluster import KMeans
import csv
import pandas as pd
# Import csv file with data in following columns:
# [PM (index)] [Longitude] [Latitude] [DaysUntilDueDate]
df = pd.read_csv('point_data_test.csv',index_col=['PM'])
numProjects = len(df)
K = numProjects // 3 # Around three projects can be worked per day
print("Number of projects: ", numProjects)
print("K-clusters: ", K)
for k in range(1, K):
# Create a kmeans model on our data, using k clusters.
# Random_state helps ensure that the algorithm returns the
# same results each time.
kmeans_model = KMeans(n_clusters=k, random_state=1).fit(df.iloc[:, :])
# These are our fitted labels for clusters --
# the first cluster has label 0, and the second has label 1.
labels = kmeans_model.labels_
# Sum of distances of samples to their closest cluster center
SSE = kmeans_model.inertia_
print("k:",k, " SSE:", SSE)
# Add labels to df
df['Labels'] = labels
#print(df)
df.to_csv('test_KMeans_out.csv')
【问题讨论】:
标签: python pandas multidimensional-array scikit-learn k-means