【问题标题】:R: filter dataframe by pairs of neighbors?R:按邻居对过滤数据帧?
【发布时间】:2021-05-05 11:55:40
【问题描述】:

我有一个邻居数据框,其中一个单元格(比如说像素)a 邻居单元格b, c, d 等。这就像一个移动窗口,所以我有一个central_id,然后是neighbors,其中每个central 有独特的邻居。然后我有一个数据框,其中包含该单元格在特定时间的值。我需要比较每个中心单元格与其相邻单元格之间的值有什么差异,以及随着时间的推移,差异如何?

这是一个例子:

set.seed(3)
nbrs <- data.frame(central_id = c("a", "a", "a",
                                  "b", "b", "b", 
                                  "c", "c", "d", 
                                  "e"),
                   nbrs_id    = c("b", "c", "d",
                                  "a", "c", "e",
                                  "a", "b", "e", "d"))


# Generate data with values
df <- data.frame(year = rep(c(1, 2, 3), each = 5),
                 id = c("a", "b", "c", "d", "e"),
                 vals = 10+ rnorm(15))

我想要的数据框看起来像这样,保持清楚邻居是什么:

  year central_id central_val nbrs_id nbrs_val
1    1          a   10.074955       b 8.354045
2    1          a   10.074955       c 11.774009
3    1          a   10.074955       d 10.765968
4    1 ...............

如何先高效过滤值数据集,通过id获取值,然后拼凑成表?我有大约 1000 万行,所以我正在寻找有效的东西。到目前为止,我只使用了一些简单的过滤来获取特定值,例如df%&gt;% filter(year == 1 &amp; id == 'a') 来获取我的vals,但这需要很长时间。我确定有更有效的方法吗?

【问题讨论】:

    标签: r filter dplyr


    【解决方案1】:

    你想要这个吗?

    set.seed(3)
    nbrs <- data.frame(central_id = c("a", "a", "a",
                                      "b", "b", "b", 
                                      "c", "c", "d", 
                                      "e"),
                       nbrs_id    = c("b", "c", "d",
                                      "a", "c", "e",
                                      "a", "b", "e", "d"))
    
    
    # Generate data with values
    df <- data.frame(year = rep(c(1, 2, 3), each = 5),
                     id = c("a", "b", "c", "d", "e"),
                     vals = 10+ rnorm(15))
    library(dplyr)
    
    df %>% left_join(nbrs, by = c('id' = 'central_id')) %>%
      left_join(df, by = c('year' = 'year', 'nbrs_id' = 'id'),
                suffix = c('', '_nbrs'))
    #>    year id      vals nbrs_id vals_nbrs
    #> 1     1  a  9.038067       b  9.707474
    #> 2     1  a  9.038067       c 10.258788
    #> 3     1  a  9.038067       d  8.847868
    #> 4     1  b  9.707474       a  9.038067
    #> 5     1  b  9.707474       c 10.258788
    #> 6     1  b  9.707474       e 10.195783
    #> 7     1  c 10.258788       a  9.038067
    #> 8     1  c 10.258788       b  9.707474
    #> 9     1  d  8.847868       e 10.195783
    #> 10    1  e 10.195783       d  8.847868
    #> 11    2  a 10.030124       b 10.085418
    #> 12    2  a 10.030124       c 11.116610
    #> 13    2  a 10.030124       d  8.781143
    #> 14    2  b 10.085418       a 10.030124
    #> 15    2  b 10.085418       c 11.116610
    #> 16    2  b 10.085418       e 11.267369
    #> 17    2  c 11.116610       a 10.030124
    #> 18    2  c 11.116610       b 10.085418
    #> 19    2  d  8.781143       e 11.267369
    #> 20    2  e 11.267369       d  8.781143
    #> 21    3  a  9.255218       b  8.868781
    #> 22    3  a  9.255218       c  9.283642
    #> 23    3  a  9.255218       d 10.252652
    #> 24    3  b  8.868781       a  9.255218
    #> 25    3  b  8.868781       c  9.283642
    #> 26    3  b  8.868781       e 10.152046
    #> 27    3  c  9.283642       a  9.255218
    #> 28    3  c  9.283642       b  8.868781
    #> 29    3  d 10.252652       e 10.152046
    #> 30    3  e 10.152046       d 10.252652
    

    reprex package (v2.0.0) 于 2021-05-05 创建

    【讨论】:

    • @thanks @AnilGoyal!!我以为应该有一些简单的解决方案,但没想到会这么简单! :-D
    猜你喜欢
    • 2011-08-28
    • 2022-07-29
    • 1970-01-01
    • 2018-02-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多