【发布时间】:2015-03-09 12:06:00
【问题描述】:
我需要实现scikit-learn's kMeans 来聚类文本文档。 example code 工作正常,但需要一些 20newsgroups 数据作为输入。我想使用相同的代码来聚类文档列表,如下所示:
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
我需要在kMeans example code 中进行哪些更改才能将此列表用作输入? (简单地采用“数据集 = 文档”是行不通的)
【问题讨论】:
-
您提供的链接无效
标签: python python-2.7 scikit-learn cluster-analysis k-means