zoo 包及其 na.locf() 函数可以帮助您,正如 Dirk Eddelbuettel 在此处所述:Replacing NAs with latest non-NA value。
library(data.table)
library(zoo)
DT <- data.table(a = c(0, NA, NA, 0, NA, 1, 1, NA, 0, NA, 1, NA, NA, NA, 0, 1, 1, 0, NA, 0))
non_nas <- DT[!is.na(a), a]
successor <- c(non_nas[-1], 0)
diff <- abs(non_nas - successor)
DT[!is.na(a), diff:=diff]
这将为您提供如下数据表:
a diff
1: 0 0
2: NA NA
3: NA NA
4: 0 1
5: NA NA
6: 1 0
7: 1 1
8: NA NA
9: 0 1
10: NA NA
11: 1 1
12: NA NA
13: NA NA
14: NA NA
15: 0 1
16: 1 0
17: 1 1
18: 0 0
19: NA NA
20: 0 0
这里的想法是,diff 列中的每个“1”都告诉您,在下面的 NA 之后,“a”中的值会发生变化。
现在您想去掉“diff”列中的 NA。为清楚起见,我们将结果放入新列“b”中。这就是zoo 包发挥作用的地方:
DT[, b:=na.locf(diff)]
这会导致
a diff b
1: 0 0 0
2: NA NA 0
3: NA NA 0
4: 0 1 1
5: NA NA 1
6: 1 0 0
7: 1 1 1
8: NA NA 1
9: 0 1 1
10: NA NA 1
11: 1 1 1
12: NA NA 1
13: NA NA 1
14: NA NA 1
15: 0 1 1
16: 1 0 0
17: 1 1 1
18: 0 0 0
19: NA NA 0
20: 0 0 0
最终
DT[is.na(a) & b == 1, which = TRUE]
会给你:
[1] 5 8 10 12 13 14