【问题标题】:dendextend: color_branches not working for certain hclust methodsdendextend:color_branches 不适用于某些 hclust 方法
【发布时间】:2017-07-13 09:41:39
【问题描述】:

我正在使用 R dendextend 包来绘制由 hclust{stats} 中的每个 hclust 方法生成的 hclust 树对象:“ward.D”、“ward.D2”、“single”、“complete”、“average”(= UPGMA)、“mcquitty”(= WPGMA)、“median”(= WPGMC)或“centroid”(= UPGMC)。

当我使用 method = "median" 或 "centroid" 时,我注意到 color_branches 的颜色编码失败。

我用随机生成的矩阵对其进行了测试,并且“中值”和“质心”方法的错误被复制,这有什么具体原因吗?

请查看输出图的链接: fig1. hclust methods (a) ward.D2, (b) median, (c) centroid

library(dendextend)
set.seed(1)
df <- as.data.frame(replicate(10, rnorm(20)))
df.names <- rep(c("black", "red", "blue", "green", "cyan"), 2)
df.col <- rep(c("black", "red", "blue", "green", "cyan"), 2)
colnames(df) <- df.names
df.dist <- dist(t(df), method = "euclidean")

# plotting works for "ward.D", "ward.D2", "single", "complete", "average", "mcquitty"
dend <- as.dendrogram(hclust(df.dist, method = "ward.D2"), labels = df.names)
labels_colors(dend) <- df.col[order.dendrogram(dend)]
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

# color_branches fails for "median" or "centroid"
dend <- as.dendrogram(hclust(df.dist, method = "median"), labels = df.names)
labels_colors(dend) <- df.col[order.dendrogram(dend)]
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

dend <- as.dendrogram(hclust(df.dist, method = "centroid"), labels = df.names)
labels_colors(dend) <- df.col[order.dendrogram(dend)]
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

我正在使用 dendextend_1.4.0。会话信息如下:

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

谢谢。

【问题讨论】:

  • 它对我来说很好,你的确切输出是什么,请粘贴它。
  • 好的,我现在明白你的意思了。问题是这段代码会产生树高“奇怪”的集群。在这种情况下,我不清楚如何解决它,因为“剪切”的含义并不明确。
  • 嗨,Tal,是的,我怀疑它与我的数据生成的“奇怪”树高有关,但由于我能够在随机矩阵中重现它,我很好奇它是否与集群有关方法——如果这些方法倾向于生成这些类型的树。标签的颜色编码有效...有没有办法让我编辑代码以在切割不清晰时进行标记并根据标签顺序分配分支的颜色?
  • 我举了一个如何处理的例子,但它并不“漂亮”。

标签: r hclust dendextend


【解决方案1】:

您可以使用branches_attr_by_clusters 解决此问题(尽管可能会有些棘手,请参见下面的示例):

library(dendextend)
set.seed(1)
df <- as.data.frame(replicate(10, rnorm(20)))
df.names <- rep(c("black", "red", "blue", "green", "cyan"), 2)
df.col <- rep(c("black", "red", "blue", "green", "cyan"), 2)
colnames(df) <- df.names
df.dist <- dist(t(df), method = "euclidean")

# plotting works for "ward.D", "ward.D2", "single", "complete", "average", "mcquitty"
dend <- as.dendrogram(hclust(df.dist, method = "ward.D2"), labels = df.names)
labels_colors(dend) <- df.col[order.dendrogram(dend)]
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

# color_branches fails for "median" or "centroid"
dend <- as.dendrogram(hclust(df.dist, method = "median"), labels = df.names)
aa <- df.col[order.dendrogram(dend)]
labels_colors(dend) <- aa
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

aa <- factor(aa, levels = unique(aa))
dend %>% branches_attr_by_clusters(aa, value = levels(aa)) %>% plot

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-09-19
    • 1970-01-01
    • 2013-12-11
    • 1970-01-01
    • 2020-04-28
    • 2019-04-11
    • 2022-01-03
    • 2014-12-17
    相关资源
    最近更新 更多