【发布时间】:2018-08-25 12:58:51
【问题描述】:
我正在尝试使用 sklearn tf-idf 和 k-means 对对话框进行聚类。我使用轮廓分数计算了最佳聚类数,但它几乎呈线性增长。那么,还有其他方法还是我做错了什么?
代码:
tfidfV = TfidfVectorizer(max_features = 40000, ngram_range = ( 1, 3 ), sublinear_tf = True)
...
X = tfidfV.fit_transform(docm2)
...
for numb in nn:
km = KMeans(n_clusters=numb)
clabels = km.fit_predict(X)
silhouette_avg = silhouette_score(X, clabels)
print("For n_clusters = ", numb, "The average silhouette_score is: ", silhouette_avg)
【问题讨论】:
标签: python optimization scikit-learn cluster-analysis tf-idf