【发布时间】:2017-03-29 16:26:20
【问题描述】:
我有一个data.table,我正在尝试通过检查某行在给定的一组列中是否具有特定值来创建一个新列。
head(d1)
MEDREC_KEY pat_key drug1 drug2 drug3 drug4 drug5 drug6 drug7 drug8 drug9 drug10 drug11 drug12
1: -140665983 669723105 Anti-infectives Cephalosporins Ethambutol Isoniazid Macrolides Penicillins Quinolones Rifamycin NA NA NA NA
2: -606290573 85924804 Anti-infectives Beta-lactams Cephalosporins Penicillins Quinolones NA NA NA NA NA NA NA
3: -615873176 161009395 Cephalosporins Penicillins NA NA NA NA NA NA NA NA NA NA
4: -616819481 36280536 Anti-infectives Cephalosporins Macrolides Quinolones NA NA NA NA NA NA NA NA
5: -625709819 720290063 Anti-infectives Cephalosporins Ethambutol Isoniazid Pyrazinamide Quinolones Rifamycin NA NA NA NA NA
6: -637094857 720918635 Anti-infectives Penicillins Quinolones NA NA NA NA NA NA NA NA NA
我想要发生的是,如果任何“药物”列 ==“大环内酯”和任何相同的列 ==“头孢菌素”,那么我的新列“正确”== 1 否则“正确”== 0(或者它可能是合乎逻辑的),就像这样:
head(d1)
MEDREC_KEY pat_key drug1 drug2 drug3 drug4 drug5 drug6 drug7 drug8 drug9 drug10 drug11 drug12 correct
1: -140665983 669723105 Anti-infectives Cephalosporins Ethambutol Isoniazid Macrolides Penicillins Quinolones Rifamycin NA NA NA NA 1
2: -606290573 85924804 Anti-infectives Beta-lactams Cephalosporins Penicillins Quinolones NA NA NA NA NA NA NA 0
3: -615873176 161009395 Cephalosporins Penicillins NA NA NA NA NA NA NA NA NA NA 0
4: -616819481 36280536 Anti-infectives Cephalosporins Macrolides Quinolones NA NA NA NA NA NA NA NA 1
5: -625709819 720290063 Anti-infectives Cephalosporins Ethambutol Isoniazid Pyrazinamide Quinolones Rifamycin NA NA NA NA NA 0
6: -637094857 720918635 Anti-infectives Penicillins Quinolones NA NA NA NA NA NA NA NA NA 0
我已经尝试了这两种方法(但我仍在学习如何破译警告消息,因此这些消息没有多大帮助,尤其是当我是 data.table 的新手时):
> d1$correct<-ifelse(d1[,c(3:14)]=="Macrolides" | d1[,c(3:14)]=="Cephalosporins", 1, 0)
Warning messages:
1: In `[<-.data.table`(x, j = name, value = value) :
12 column matrix RHS of := will be treated as one vector
2: In `[<-.data.table`(x, j = name, value = value) :
Supplied 56868 items to be assigned to 4739 items of column 'correct' (52129 unused)
>
>
> selected_cols<-c("drug1", "drug2", "drug3", "drug4", "drug5", "drug6", "drug7", "drug8", "drug9", "drug10", "drug11", "drug12")
> d1$correct<-ifelse(d1 %in% selected_cols=="Macrolides" | d1 %in% selected_cols=="Cephalosporins", 1, 0)
Warning message:
In `[<-.data.table`(x, j = name, value = value) :
Supplied 16 items to be assigned to 4739 items of column 'correct' (recycled leaving remainder of 3 items).
我得到的最接近的是:
d1$correct<-apply(d1, 1, function(r) any(r %in% c("Macrolides", "Cephalosporins")))
如果 其中一个 在各列中都为真,那将给出 TRUE,但如果 两个 在整个列中都为真,我不知道该怎么做列。我宁愿不必使用非常庞大的 ifelse 语句,因为我需要制作 12 列和更多组合,而 NA 无论如何都会放弃它。
我喜欢 dplyr 或 data.table 解决方案,因为它们非常优雅,但在这一点上我很绝望。
【问题讨论】:
标签: r if-statement data.table any