【问题标题】:Connecting points with ggplot in R用R中的ggplot连接点
【发布时间】:2019-10-06 15:24:34
【问题描述】:

我正在寻找一种在 R 中使用 ggplot 连接一些点的方法。我想将每个点连接到最近的点。这是我的数据作为散点图的样子。

x <- c(0.81,0.82,0.82,0.82,0.83,0.83,0.83,0.84,0.84,0.84,0.85,0.85,0.85,0.86,0.86,0.86,0.87,0.87,0.87,0.88,0.88,0.88,0.89,0.89,0.89,0.9,0.9,0.9,0.91,0.91,0.91,0.92,0.92,0.92,0.93,0.93,0.93,0.93,0.93,0.94,0.94,0.94,0.94,0.94,0.95,0.95,0.95,0.95,0.95,0.96,0.96,0.96,0.96,0.96,0.97,0.97,0.97,0.97,0.97,0.98,0.98,0.98,0.98,0.98,0.99,0.99,0.99,0.99,1,1,1,1,1.01,1.01,1.01,1.01,1.02,1.02,1.02,1.02,1.03,1.03,1.03,1.03,1.04,1.04,1.04,1.04,1.05,1.05,1.05,1.05,1.06,1.06,1.06,1.06,1.07,1.07,1.07,1.07,1.08,1.08,1.08,1.08,1.09,1.09,1.09,1.09,1.1,1.1,1.1,1.1,1.11,1.11,1.11,1.11,1.12,1.12,1.12,1.12,1.13,1.13,1.13,1.13,1.14,1.14,1.15,1.15,1.16,1.16,1.17,1.17,1.18,1.18,1.19,1.19,1.2,1.2,1.21,1.21,1.22,1.22,1.23,1.23,1.24,1.24,1.25,1.25,1.26,1.26,1.27)

y <- c(-1.295,-0.535,-1.575,-1.295,-0.525,-1.575,-1.295,-0.515,-1.575,-1.285,-0.515,-1.575,-1.285,-0.505,-1.575,-1.275,-0.495,-1.575,-1.275,-0.485,-1.575,-1.265,-0.485,-1.575,-1.265,-0.475,-1.575,-1.255,-0.465,-1.575,-1.255,-0.455,-1.575,-1.245,-0.445,1.285,1.545,-1.575,-1.245,-0.435,1.165,1.675,-1.575,-1.235,-0.425,1.085,1.765,-1.575,-1.235,-0.405,1.015,1.845,-1.575,-1.225,-0.395,0.965,1.905,-1.575,-1.215,-0.385,0.915,1.965,-1.575,-1.215,-0.375,0.865,-1.575,-1.205,-0.355,0.825,-1.575,-1.205,-0.345,0.785,-1.565,-1.195,-0.325,0.745,-1.565,-1.185,-0.305,0.705,-1.565,-1.185,-0.285,0.665,-1.565,-1.175,-0.265,0.625,-1.565,-1.165,-0.245,0.585,-1.565,-1.165,-0.225,0.545,-1.565,-1.155,-0.195,0.495,-1.555,-1.145,-0.165,0.455,-1.555,-1.145,-0.135,0.405,-1.555,-1.135,-0.0849999999999999,0.345,-1.555,-1.125,-0.035,0.275,-1.545,-1.115,0.0850000000000001,0.145,-1.545,-1.115,-1.545,-1.105,-1.545,-1.095,-1.535,-1.085,-1.535,-1.085,-1.535,-1.075,-1.525,-1.065,-1.525,-1.055,-1.525,-1.045,-1.515,-1.045,-1.515,-1.035,-1.505,-1.025,-1.505,-1.015,-1.495,-1.005,-1.495)

example_df <- tibble(x = x, y = y)

ggplot(example_df, aes(x = x, y = y)) + 
  geom_point()

geom_line 的默认行为是根据坐标在数据框中出现的顺序连接坐标。有没有一种简单的方法可以根据点之间的欧几里得距离来连接点?

【问题讨论】:

  • 在您的完整问题中,您是否有某种方法可以标记一个点属于哪个组,或者是问题的一部分?就像在这种情况下会有 3 个组,因为(我假设)您期望 3 条路径通过这些点

标签: r ggplot2


【解决方案1】:

这是您提出的问题的解决方案,虽然我怀疑这不是您真正想要的,但它可能会有所帮助......

distmat <- as.matrix(dist(example_df))    #matrix of Euclidean distances between rows
diag(distmat) <- Inf                      #remove zeros on diagonal
nearest <- apply(distmat, 1, which.min)   #find index of nearest point to each point
example_df$xend <- example_df$x[nearest]  #set end point of segment from each point
example_df$yend <- example_df$y[nearest]

ggplot(example_df, aes(x = x, y = y, xend = xend, yend = yend)) + 
  geom_point() +
  geom_segment(colour = "blue")

【讨论】:

  • 这有帮助,但我正在寻找连接所有点。理想情况下,我想使用 geom_path 和分组变量之类的东西。
  • 为什么有些点之间会有差距?查看代码,我不明白为什么会发生这种情况
  • @camille 这只是连接到最近的点。如果 A 离 B 最近,B 离 A 最近,那么 B 将不会连接到 C,即使这看起来像是下一步的视觉上明智的做法。对于沿着三个感知点系列的连续线,其中一些需要连接到第二近的(或者,可以想象,甚至更远)。我认为这是一个不平凡的问题!
  • 我认为解决方案可能涉及在选择最近点的每个步骤中迭代地从候选点集中删除点。例如,如果选择 B 作为离 A 最近的点,那么在寻找离 B 最近的点时,A 不应该是一个选项。
  • 你说得对,这可能是一个不平凡的问题!
【解决方案2】:

另一个答案 - 这将适用于这些数据,但不是一般

example_df$group <- cut(example_df$y, 
                        breaks = c(Inf, -0.8, -1.4, -Inf))     #breaks determined 'by eye'
example_df <- example_df[order(example_df$y), ]                #sort by y
ggplot(example_df, aes(x = x, y = y, group = group)) + 
  geom_point() +
  geom_path(colour = "blue")

【讨论】:

  • 谢谢。这解决了提示中所述的问题。能够连接点而不必在 y 轴上的拆分处分组会很好。此解决方案不适用于点组不能被水平线分开的情况。希望有人会开发一个包来将此功能添加到 ggplot。
【解决方案3】:

这与 Andrew Gustar 的 cut-based answer 不同之处仅在于如何分隔 3 条路径。我希望它更像是一个可扩展的过程,所以我尝试使用层次聚类根据彼此之间的距离将点放入 3 个集群中。在这种情况下,它们很容易分离。对于其他数据,它可能会更加棘手,您可能需要不同的聚类算法。然后根据另一个答案(对他们 +1),按 y 值排列每个集群,以获得以正确顺序绘制的路径。

library(dplyr)
library(ggplot2)

example_df <- tibble(x = x, y = y)
clust <- hclust(dist(example_df), method = "single")

df_clustered <- example_df %>%
  mutate(cluster = as.factor(cutree(clust, k = 3))) %>%
  arrange(cluster, y)

ggplot(df_clustered, aes(x = x, y = y, color = cluster)) +
  geom_point() +
  geom_path()

【讨论】:

  • 没有考虑基于集群的解决方案。这很有用。
猜你喜欢
  • 1970-01-01
  • 2018-08-31
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-06-28
  • 1970-01-01
  • 2022-06-21
相关资源
最近更新 更多