【发布时间】:2018-08-17 11:12:03
【问题描述】:
我有一个具有以下结构的数据框(摘要示例,不是实际的)
dput(df1)
structure(list(MedID = c(111, 111, 111, 111, 111, 111, 222, 222,
222, 222, 222), Service = structure(c(1L, 1L, 2L, 1L, 1L, 3L,
3L, 2L, 1L, 1L, 3L), .Label = c("Acute care", "Ext care", "Outpt
care"), class = "factor"), AdmitDate = structure(c(16832, 16861,
16892, 16922, 16953, 16983, 17181, 17212, 17240, 17271, 17301), class
= "Date"), Flag = c(0, 0, 99, 0, 0, 0, 0, 99, 0, 0, 0)), .Names =
c("MedID", "Service", "AdmitDate", "Flag"), row.names = c(NA, -11L),
class = "data.frame")
> df1
MedID Service AdmitDate Flag
1 111 Acute care 2016-02-01 0
2 111 Acute care 2016-03-01 0
3 111 Ext care 2016-04-01 99
4 111 Acute care 2016-05-01 0
5 111 Acute care 2016-06-01 0
6 111 Outpt care 2016-07-01 0
7 222 Outpt care 2017-01-15 0
8 222 Ext care 2017-02-15 99
9 222 Acute care 2017-03-15 0
10 222 Acute care 2017-04-15 0
11 222 Outpt care 2017-05-15 0
我希望使用 dplyr、group_by(MedID) 和 mutate 在新数据帧中添加一列(我们在 df2 中将其称为 Flag2),以便在每个患者 (MedID) 中 df2$Flag2 列 == 1 对于该唯一 MedID 中的每个后续行,但仅在 df1$Flag 列 == 99 之后,否则 df2$Flag2 列得到 0。如果 df1$Flag == 99 在第一个中,我可以根据需要进行编码MedID 的行,否则我的代码要么在 df2$Flag2 中仅在 df1$Flag == 99 的行中生成 1,或 它为给定 MedID 中的所有行生成 1,其中 df1$标志 == 99。所需的输出是:
dput(df2)
structure(list(MedID = c(111, 111, 111, 111, 111, 111, 222, 222,
222, 222, 222), Service = structure(c(1L, 1L, 2L, 1L, 1L, 3L,
3L, 2L, 1L, 1L, 3L), .Label = c("Acute care", "Ext care", "Outpt
care"), class = "factor"), AdmitDate = structure(c(16832, 16861,
16892,16922, 16953, 16983, 17181, 17212, 17240, 17271, 17301), class
= "Date"),Flag = c(0, 0, 99, 0, 0, 0, 0, 99, 0, 0, 0), Flag2 = c(0,
0, 1, 1, 1, 1, 0, 1, 1, 1, 1)), .Names = c("MedID", "Service",
"AdmitDate", "Flag", "Flag2"), row.names = c(NA, -11L), class =
"data.frame")
> df2
MedID Service AdmitDate Flag Flag2
1 111 Acute care 2016-02-01 0 0
2 111 Acute care 2016-03-01 0 0
3 111 Ext care 2016-04-01 99 1
4 111 Acute care 2016-05-01 0 1
5 111 Acute care 2016-06-01 0 1
6 111 Outpt care 2016-07-01 0 1
7 222 Outpt care 2017-01-15 0 0
8 222 Ext care 2017-02-15 99 1
9 222 Acute care 2017-03-15 0 1
10 222 Acute care 2017-04-15 0 1
11 222 Outpt care 2017-05-15 0 1
这是代码的一个片段示例,但不完整,因为它无法正确执行...我是否需要将 mutate 嵌套在 For 循环中,这看起来像是混合的 R 编码? :( 注意:df1$Flag 每个 MedID 只能 == 99 一次,我认为这应该更容易。
`df2 <- df1 %>% `
`group_by(MedID) %>%`
`mutate(Flag2 = ifelse(df1$Flag == 99, 1, 0))`
【问题讨论】: