【问题标题】:How to compute a new variable in R based on values of multiple other variables?如何根据多个其他变量的值计算 R 中的新变量?
【发布时间】:2019-07-06 12:19:09
【问题描述】:

我正在尝试使用条件计算根据多个现有变量的值来计算一个新变量。具体来说,新变量是肾功能 (eGFR),它是根据一个人的性别、年龄、是否是(非)黑人以及两种血液成分(即肌酐和胱抑素 C)的浓度来估计的。

我尝试使用 R 的 if...else 语句来完成此操作,但遇到了警告消息,之后没有任何反应。所有变量都包含在数据框“d”中。

基本上,我希望 R 做的是:如果受试者是男性 (== 1) 和非黑人 (!= 1),血肌酐 ≤ 0.9 和胱抑素 C ≤ 0.8,那么一个人的肾功能是通过以下方式估计的:

https://latex.codecogs.com/png.latex?\bg_white&space;eGFR=135\cdot\left&space;(&space;\frac{creatinine}{0.9}&space;\right&space;)^{-0.207}\cdot\left&space;(&space;\frac{cystatinC}{0.8}&space;\right&space;)^{-0.375}\cdot0.995^{age}

等等。为此,我应用了以下代码:

if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & d$race != 1){ ### Non-Negroid males
    d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC > 0.8 & d$race != 1){
    d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC <= 0.8 & d$race != 1){
    d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC > 0.8 & d$race != 1){
    d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC <= 0.8 & d$race != 1){ ### Non-Negroid females
    d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC > 0.8 & d$race != 1){
    d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC <= 0.8 & d$race != 1){
    d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC > 0.8 & d$race != 1){
    d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & d$race == 1){ ### Negroid males
    d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC > 0.8 & d$race == 1){
    d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC <= 0.8 & d$race == 1){
    d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC > 0.8 & d$race == 1){
    d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC <= 0.8 & d$race == 1){ ### Negroid females
    d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC > 0.8 & d$race == 1){
    d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC <= 0.8 & d$race == 1){
    d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC > 0.8 & d$race == 1){
    d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  }

但是,当运行这个 R 时:

Warning message:
In if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 &  :
  the condition has length > 1 and only the first element will be used

谁能帮帮我?

更新:以下是一些示例数据,包括年龄、性别(0=女性,1=男性)、种族(1=黑人,!= 1 是非黑人)、肌酐、胱抑素C,以及用于公式验证的手动计算的 eGFR:

reconstruct <- structure(list(sex = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 2L, 
2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), race = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c("0", "1", "2", 
"3", "4"), class = "factor"), age = c(71.9425051334702, 65.1964407939767, 
46.2258726899384, 51.7152635181383, 54.8747433264887, 71.6714579055441, 
36.0793976728268, 54.3764544832307, 57.9110198494182, 49.9438740588638
), creatinine = c(0.633484162895928, 0.984162895927602, 0.769230769230769, 
0.8710407239819, 0.769230769230769, 0.690045248868778, 0.893665158371041, 
1.02941176470588, 0.83710407239819, 0.701357466063348), cystatinC = c(0.73, 
0.85, 0.64, 0.9, 0.83, 0.95, 1.04, 1, 0.95, 0.68), eGFR =     c(96.1605293085191, 
73.17567750685, 105.934761135043, 80.8974371814808, 103.186483803272, 
88.1306212690947, 77.7383905116244, 66.9892381719287, 90.7223944432609, 
107.443909414004)), row.names = c(NA, 10L), class = "data.frame")

【问题讨论】:

  • 您正在将整个向量与一个值进行比较。这会返回一个类似 [false true true false] 的向量。最后你问if([false true true false]),这没有意义。所以 R 将其削减为第一个值。您需要执行其他逻辑运算符。试试 %in% 或类似的东西,它只返回一个逻辑值。
  • 您也可以尝试在所有条件下使用any(),但这会修复您的语法。尝试考虑像 ifelse 这样的矢量化操作
  • 可以使用ifelse() 函数而不是if ... else 构造——但有很多条件既不高效也不可读。
  • 也许您可以首先基于d$sexd$race 创建一个新的因子变量,并为生成的4 个因子水平开发简化公式(可以抽象为它们自己的函数定义的公式),然后通过使用这个因子和新函数,以更易读的方式填充新变量。
  • 您能否发布样本数据,最好以dput 格式覆盖所有案例?如果是,请使用dput(head(d, 20)) 的输出修改问题。

标签: r dataframe variables encoding


【解决方案1】:

我相信下面的函数遵循问题中定义的内容,但未经测试,因为没有数据和预期的输出。

eGFRfun <- function(DF){
  i_sex <- DF[["sex"]] == 1
  i_creat_0.9 <- DF[["creatinine"]] <= 0.9
  i_creat_0.7 <- DF[["creatinine"]] <= 0.7
  i_cyst <- DF[["cystatinC"]] <= 0.8
  i_race <- DF[["race"]] == 1

  const_fac <- ifelse(i_race, 135, 145.8) + 5*(i_sex - 1)
  creat_denom <- ifelse(i_sex, 0.9, 0.7)
  creat_pow <- ifelse(i_sex & i_creat_0.9, -0.207, -0.601)
  creat_pow <- ifelse(i_sex & i_creat_0.7, -0.248, -0.601)
  cystC_fac <- (DF[["cystatinC"]] / 0.8)^ifelse(i_cyst, -0.375, -0.711)
  age_fac <- 0.995^DF[["age"]]

  const_fac * (DF[["creatinine"]] / creat_denom)^creat_pow * cystC_fac * age_fac
}

示例用法:

d$eGFR <- eGFRfun(d)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-01-11
    • 1970-01-01
    • 2022-08-15
    • 1970-01-01
    • 2023-01-02
    • 1970-01-01
    • 1970-01-01
    • 2022-07-15
    相关资源
    最近更新 更多