【发布时间】:2017-03-28 15:16:37
【问题描述】:
我正在比较字符串,并试图确定最佳的集群数量。我有以下数据集:
d <- structure(list(Fund = structure(c(8L, 9L, 11L, 10L, 2L, 3L, 1L,
4L, 5L, 7L, 6L), .Label = c("Branch April China", "Branch April Europe",
"Branch April US", "Branch Emerging Markets EUR", "Branch Emerging Markets GBP",
"Branch Emerging Markets JPY", "Branch Emerging Markets USD",
"Branch EUR", "Branch GBP", "Branch JPY", "Branch USD"), class = "factor")), .Names = "Fund", class = "data.frame", row.names = c(NA,
-11L))
我按如下方式计算了 Levenshtein 距离并执行了层次聚类
dist <- adist(d$Fund)
rownames(dist) <- d$Fund
colnames(dist) <- d$Fund
hc <- hclust(as.dist(dist))
现在,我想使用以下命令确定最佳集群数量:
df <- data.frame(d$Fund,cutree(hc,2))
我已经阅读了一些关于找到最佳集群数量的信息,但它与kmeans 命令更相关。在我的示例中,如何找到最佳集群数?非常感谢您的宝贵帮助
【问题讨论】:
-
看看包 NbClust。
-
谢谢,你知道如何使用字符串实现它
标签: r machine-learning tree