【发布时间】:2019-07-06 12:19:09
【问题描述】:
我正在尝试使用条件计算根据多个现有变量的值来计算一个新变量。具体来说,新变量是肾功能 (eGFR),它是根据一个人的性别、年龄、是否是(非)黑人以及两种血液成分(即肌酐和胱抑素 C)的浓度来估计的。
我尝试使用 R 的 if...else 语句来完成此操作,但遇到了警告消息,之后没有任何反应。所有变量都包含在数据框“d”中。
基本上,我希望 R 做的是:如果受试者是男性 (== 1) 和非黑人 (!= 1),血肌酐 ≤ 0.9 和胱抑素 C ≤ 0.8,那么一个人的肾功能是通过以下方式估计的:
等等。为此,我应用了以下代码:
if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & d$race != 1){ ### Non-Negroid males
d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC > 0.8 & d$race != 1){
d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC <= 0.8 & d$race != 1){
d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC > 0.8 & d$race != 1){
d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC <= 0.8 & d$race != 1){ ### Non-Negroid females
d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC > 0.8 & d$race != 1){
d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC <= 0.8 & d$race != 1){
d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC > 0.8 & d$race != 1){
d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & d$race == 1){ ### Negroid males
d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC > 0.8 & d$race == 1){
d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC <= 0.8 & d$race == 1){
d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC > 0.8 & d$race == 1){
d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC <= 0.8 & d$race == 1){ ### Negroid females
d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC > 0.8 & d$race == 1){
d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC <= 0.8 & d$race == 1){
d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC > 0.8 & d$race == 1){
d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
}
但是,当运行这个 R 时:
Warning message:
In if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & :
the condition has length > 1 and only the first element will be used
谁能帮帮我?
更新:以下是一些示例数据,包括年龄、性别(0=女性,1=男性)、种族(1=黑人,!= 1 是非黑人)、肌酐、胱抑素C,以及用于公式验证的手动计算的 eGFR:
reconstruct <- structure(list(sex = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), race = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c("0", "1", "2",
"3", "4"), class = "factor"), age = c(71.9425051334702, 65.1964407939767,
46.2258726899384, 51.7152635181383, 54.8747433264887, 71.6714579055441,
36.0793976728268, 54.3764544832307, 57.9110198494182, 49.9438740588638
), creatinine = c(0.633484162895928, 0.984162895927602, 0.769230769230769,
0.8710407239819, 0.769230769230769, 0.690045248868778, 0.893665158371041,
1.02941176470588, 0.83710407239819, 0.701357466063348), cystatinC = c(0.73,
0.85, 0.64, 0.9, 0.83, 0.95, 1.04, 1, 0.95, 0.68), eGFR = c(96.1605293085191,
73.17567750685, 105.934761135043, 80.8974371814808, 103.186483803272,
88.1306212690947, 77.7383905116244, 66.9892381719287, 90.7223944432609,
107.443909414004)), row.names = c(NA, 10L), class = "data.frame")
【问题讨论】:
-
您正在将整个向量与一个值进行比较。这会返回一个类似 [false true true false] 的向量。最后你问
if([false true true false]),这没有意义。所以 R 将其削减为第一个值。您需要执行其他逻辑运算符。试试 %in% 或类似的东西,它只返回一个逻辑值。 -
您也可以尝试在所有条件下使用
any(),但这会修复您的语法。尝试考虑像 ifelse 这样的矢量化操作 -
您可以使用
ifelse()函数而不是if ... else构造——但有很多条件既不高效也不可读。 -
也许您可以首先基于
d$sex和d$race创建一个新的因子变量,并为生成的4 个因子水平开发简化公式(可以抽象为它们自己的函数定义的公式),然后通过使用这个因子和新函数,以更易读的方式填充新变量。 -
您能否发布样本数据,最好以
dput格式覆盖所有案例?如果是,请使用dput(head(d, 20))的输出修改问题。
标签: r dataframe variables encoding