【问题标题】:cumsum is.na with rle ignoring consectives NA'scumsum is.na 与 rle 忽略 consectives NA's
【发布时间】:2019-10-15 15:09:59
【问题描述】:

简单的问题。假设我有以下数据:

library(tidyverse)
df <- data.frame(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2),
                     variable = c(NA, "a", NA, "b", "c", NA, NA, NA, NA, "a", NA, "c", NA, NA, "d", NA, NA, "a"))
df
   group variable
1      1     <NA>
2      1        a
3      1     <NA>
4      1        b
5      1        c
6      1     <NA>
7      1     <NA>
8      1     <NA>
9      1     <NA>
10     1        a
11     1     <NA>
12     1        c
13     1     <NA>
14     1     <NA>
15     1        d
16     2     <NA>
17     2     <NA>
18     2        a

我只想使用cumsum(is.na(variable) 计算缺失变量,但忽略连续缺失的变量,因此我想要的输出如下所示:

   group variable newvariable
1      1     <NA>           1
2      1        a           1
3      1     <NA>           2
4      1        b           2
5      1        c           2
6      1     <NA>           3
7      1     <NA>           3
8      1     <NA>           3
9      1     <NA>           3
10     1        a           3
11     1     <NA>           4
12     1        c           4
13     1     <NA>           5
14     1     <NA>           5
15     1        d           5
16     2     <NA>           1
17     2     <NA>           1
18     2        a           1

我想我需要将rle 合并到我的代码中:

df %>%
  group_by(group, na_group = {na_group = rle(variable); rep(seq_along(na_group$lengths), na_group$lengths)}) %>%
  mutate(newvariable = cumsum((is.na(variable)))) #?

也许map 组可以工作。请问有什么建议吗?

参考: Identify sets of NA in a vector Count consecutive values in groups with condition with dplyr and rle

【问题讨论】:

标签: r dplyr sequence seq run-length-encoding


【解决方案1】:
df %>%
    group_by(group) %>%
    mutate(new = with(rle(is.na(variable)), rep(cumsum(values), lengths))) %>%
    ungroup()

【讨论】:

    【解决方案2】:

    另一种选择是在逻辑向量上使用diffcumsum

    library(data.table)
    setDT(df)[, new := cumsum(c(TRUE, diff(is.na(variable)) > 0) ), group ]
    

    dplyr

    library(dplyr)
    df %>%
       group_by(group) %>%
       mutate(new = cumsum(c(TRUE, diff(is.na(variable)) > 0)))
    # A tibble: 18 x 3
    # Groups:   group [2]
    #   group variable   new
    #   <dbl> <fct>    <int>
    # 1     1 <NA>         1
    # 2     1 a            1
    # 3     1 <NA>         2
    # 4     1 b            2
    # 5     1 c            2
    # 6     1 <NA>         3
    # 7     1 <NA>         3
    # 8     1 <NA>         3
    # 9     1 <NA>         3
    #10     1 a            3
    #11     1 <NA>         4
    #12     1 c            4
    #13     1 <NA>         5
    #14     1 <NA>         5
    #15     1 d            5
    #16     2 <NA>         1
    #17     2 <NA>         1
    #18     2 a            1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-11-12
      • 2011-12-11
      • 2016-02-12
      • 2018-01-02
      相关资源
      最近更新 更多