【发布时间】:2015-11-02 01:09:57
【问题描述】:
我有一个这样的数据框
ID <- c("ID300","ID301","ID302","ID303","ID304","ID305","ID306","ID307","ID308","ID309")
Measurement <- c("Length","Length","Length","Length","Length","Length","Length","Length","Length","Length")
PASSFAIL <- c("FAIL","PASS","FAIL","FAIL#Pts","PASS","PASS","PASS","PASS","PASS","FAIL")
df1 <- data.frame(ID,Measurement,PASSFAIL)
第 1 部分 我正在尝试为每个 ID 创建一个失败率列。我试图计算的方式是使用 5 个 ID 的窗口。例如
Fail Rate = (Number of Fails)/(Number of Fails + Number of Pass)
ID300 <- (Fails of Row1 to Row5)/(Total from Row1 to Row5) = (3/5) = 0.6
注意:在 df1 中,PASSFAIL 列中有 FAIL 的任何内容都被视为失败。
如果窗口大小小于 5,它也应该返回 NA,因此我想要的输出看起来像这样
ID Measurement PASSFAIL FR
1 ID300 Length FAIL 0.6
2 ID301 Length PASS 0.4
3 ID302 Length FAIL 0.4
4 ID303 Length FAIL#Pts 0.2
5 ID304 Length PASS 0.0
6 ID305 Length PASS 0.2
7 ID306 Length PASS NA
8 ID307 Length PASS NA
9 ID308 Length PASS NA
10 ID309 Length FAIL NA
第 2 部分 完成此操作后,我需要考虑到相同的窗口 5,重新计算添加的每个新 ID 的失败率。例如,我想要的输出是
ID Measurement PASSFAIL FR
1 ID296 Length PASS 0.4
2 ID297 Length FAIL 0.6
3 ID298 Length PASS 0.6
4 ID299 Length FAIL 0.6
5 ID300 Length FAIL 0.8
6 ID301 Length FAIL 0.6
7 ID302 Length PASS NA
8 ID303 Length FAIL NA
9 ID304 Length FAIL#Pts NA
10 ID305 Length PASS NA
我目前正在通过执行类似这样的操作来计算失败率,它会为整个数据帧计算失败率。考虑到窗口大小为 5,我不知道如何使用循环顺序计算每个 ID。
setDT(df1)
# aggregate
df1 <- df1[, .( FR = (sum(PASSFAIL != "PASS")/.N))]
请提供一些意见。
【问题讨论】:
-
我建议您查看
zoo包中的filter或rollapply。例如。 -filter(grepl("FAIL",df1$PASSFAIL), rep(1,5)/5, sides=1)另请注意,您可以将by=参数传递给data.table以运行由by=变量定义的组内的函数。
标签: r dataframe data.table dplyr reshape2