【问题标题】:Find rows with NA between 0s and 1s查找 NA 介于 0 和 1 之间的行
【发布时间】:2020-09-29 06:40:43
【问题描述】:

我想识别包含 NA 且介于零和一之间的行。考虑这个data.table:

DT <- data.table(a = c(0, NA, NA, 0, NA, 1, 1, NA, 0, NA, 1, NA, NA, NA, 0, 1, 1, 0, NA, 0))

# DT
# a
# 1:  0
# 2: NA
# 3: NA
# 4:  0
# 5: NA
# 6:  1
# 7:  1
# 8: NA
# 9:  0
# 10: NA
# 11:  1
# 12: NA
# 13: NA
# 14: NA
# 15:  0
# 16:  1
# 17:  1
# 18:  0
# 19: NA
# 20:  0

如何识别行号。 5点、8点、10点和12点14分?

【问题讨论】:

    标签: r indexing data.table


    【解决方案1】:

    您可以尝试使用approx

    DT[,b := approx((1:.N)[!is.na(a)],na.omit(a),1:.N)$y]
    

    然后申请

    DT[, which(is.na(a) & b>0 & b<1)]
    

    DT[, which(is.na(a) & between(b, 0, 1, FALSE))]
    

    给了

    [1]  5  8 10 12 13 14
    

    【讨论】:

    • 谢谢。猜猜你可以让它更简单一点,省略 is.na(DT[,a]) &,所以它只是 which(DT[,b]>0 & DT[,b]
    【解决方案2】:

    NA 序列的开头可以这样计算:

    library("data.table")
    DT <- data.table(a = c(0, NA, NA, 0, NA, 1, 1, NA, 0, NA, 1, NA, NA, NA, 0, 1, 1, 0, NA, 0))
    
    r <- DT[, rle(is.na(a))]
    R <- data.table(r$values, r$lengths, start=c(1, 1+head(cumsum(r$lengths), -1)))
    
    i <- R[(V1), start]
    j <- R[(V1), start+V2-1]
    i[(DT[i-1, a] + DT[j+1, a])==1]
    # result: [1]  5  8 10 12
    

    【讨论】:

      【解决方案3】:

      zoo 包及其 na.locf() 函数可以帮助您,正如 Dirk Eddelbuettel 在此处所述:Replacing NAs with latest non-NA value

      library(data.table)
      library(zoo)
      
      DT <- data.table(a = c(0, NA, NA, 0, NA, 1, 1, NA, 0, NA, 1, NA, NA, NA, 0, 1, 1, 0, NA, 0))
      
      non_nas <- DT[!is.na(a), a]
      successor <- c(non_nas[-1], 0)
      diff <- abs(non_nas - successor)
      DT[!is.na(a), diff:=diff]
      

      这将为您提供如下数据表:

           a diff
       1:  0    0
       2: NA   NA
       3: NA   NA
       4:  0    1
       5: NA   NA
       6:  1    0
       7:  1    1
       8: NA   NA
       9:  0    1
      10: NA   NA
      11:  1    1
      12: NA   NA
      13: NA   NA
      14: NA   NA
      15:  0    1
      16:  1    0
      17:  1    1
      18:  0    0
      19: NA   NA
      20:  0    0
      

      这里的想法是,diff 列中的每个“1”都告诉您,在下面的 NA 之后,“a”中的值会发生变化。

      现在您想去掉“diff”列中的 NA。为清楚起见,我们将结果放入新列“b”中。这就是zoo 包发挥作用的地方:

      DT[, b:=na.locf(diff)]
      

      这会导致

           a diff b
       1:  0    0 0
       2: NA   NA 0
       3: NA   NA 0
       4:  0    1 1
       5: NA   NA 1
       6:  1    0 0
       7:  1    1 1
       8: NA   NA 1
       9:  0    1 1
      10: NA   NA 1
      11:  1    1 1
      12: NA   NA 1
      13: NA   NA 1
      14: NA   NA 1
      15:  0    1 1
      16:  1    0 0
      17:  1    1 1
      18:  0    0 0
      19: NA   NA 0
      20:  0    0 0
      

      最终

      DT[is.na(a) & b == 1, which = TRUE]
      

      会给你:

      [1]  5  8 10 12 13 14
      

      【讨论】:

        猜你喜欢
        • 2012-04-10
        • 2015-02-20
        • 1970-01-01
        • 2022-11-29
        • 1970-01-01
        • 2021-11-08
        • 1970-01-01
        • 1970-01-01
        • 2015-06-18
        相关资源
        最近更新 更多