【问题标题】:Using conditional elseif statement in R but not working在 R 中使用条件 elseif 语句但不起作用
【发布时间】:2020-07-18 20:05:37
【问题描述】:

这是我第一次使用elseif。我希望创建一个新列mobile$tenuredate(以月为单位),并试图找出产生 NA 值的代码的问题。

结果

mobile$status == 'active'

行为 mobile$tenuredate 提供 NA 值(它们不应该是 NA)。

mobile$status == 'stopped'

行为mobile$tenuredate 提供有效值。

下面是代码

mobile$tenuredate = if (mobile$status=="stopped") {
  round(difftime(mobile$EFFECTIVEDATE, mobile$STARTDATE, units="weeks") / 4.348125)
} else if ((mobile$status == "active") && (mobile$difftemp >= 0)) {
  round(difftime(mobile$CONTRACTENDDATE, mobile$STARTDATE, units="weeks") / 4.348125)
} else {
  round(difftime(mobile$CUTOFFDATE, mobile$STARTDATE, units="weeks") / 4.348125)
}

Data file in CSV available here

这是一个示例数据框。

structure(list(STARTDATE = structure(c(11413, 11639, 11953, 12212, 
11335, 12050, 12142, 11225, 12176, 11386), class = "Date"), STOPDATE = structure(c(11436, 
12079, NA, 12225, 11345, 12124, 12226, 11999, 12176, 11758), class = "Date"), 
    EFFECTIVEDATE = structure(c(11436, 12079, NA, 12225, 11345, 
    12124, 12226, 11999, 12176, 11758), class = "Date"), CONTRACTENDDATE = structure(c(11778, 
    12004, 12318, 12578, 11700, 12415, 12508, 11977, 12542, 11751
    ), class = "Date"), CUTOFFDATE = structure(c(12273, 12273, 
    12273, 12273, 12273, 12273, 12273, 12273, 12273, 12273), class = "Date"), 
    status = c("stopped", "stopped", "active", "stopped", "stopped", 
    "stopped", "stopped", "stopped", "stopped", "stopped"), tenuredate = structure(c(1, 
    14, NA, 0, 0, 2, 3, 25, 0, 12), class = "difftime", units = "weeks")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

提前致谢。

【问题讨论】:

  • 不查看您的数据(我目前不关注链接),虽然过于冗长,但我在您的代码中看不到任何会引入 NAs 的内容。这更可能是您如何读取数据的问题,也许是您对数据本身的假设。如果您在问题中使用dput(head(mobile))data.frame(...) 在此处提供数据样本,将会有所帮助。 (dput 的原因是它提供了 R 看到的明确数据,而不是您假设 R 使用它的方式。)
  • 感谢您的建议!已提供上述示例数据,使用dput()
  • 好的。 (1) if 需要单例比较,您提供的是向量。矢量化方法是合适的。 (2) ifelse 当然是合乎逻辑的步骤,但它会让你失败,因为它会丢弃class,你将失去你的Date 课程(尽管数学仍然会发生)。待机...

标签: r if-statement na


【解决方案1】:

if 要求其条件长度为 1,而您提供的是向量。合乎逻辑的替换是使用ifelse,但ifelse 的一个众所周知的(在R 退伍军人中)问题是它会掉级,所以你的Datedifftime 列变成numeric,你必须重铸它们。 (这不是世界末日,但让我们暂时保持课程不变。)

mobile$tenuredate <- NULL # just to clean up your previous attempt, otherwise not needed
mobile$usedate <- Sys.Date()[NA] # all NAs are not created equal ...
ind <- mobile$status == "stopped"
mobile$usedate[ind] <- mobile$EFFECTIVEDATE[ind]
ind <- (mobile$status == "active") && (mobile$difftemp >= 0)
mobile$usedate[ind] <- mobile$CONTRACTENDDATE[ind]
ind <- is.na(mobile$usedate)
mobile$usedate[ind] <- mobile$CUTOFFDATE[ind]
mobile
# # A tibble: 10 x 7
#    STARTDATE  STOPDATE   EFFECTIVEDATE CONTRACTENDDATE CUTOFFDATE status  usedate   
#    <date>     <date>     <date>        <date>          <date>     <chr>   <date>    
#  1 2001-04-01 2001-04-24 2001-04-24    2002-04-01      2003-08-09 stopped 2001-04-24
#  2 2001-11-13 2003-01-27 2003-01-27    2002-11-13      2003-08-09 stopped 2003-01-27
#  3 2002-09-23 NA         NA            2003-09-23      2003-08-09 active  2003-08-09
#  4 2003-06-09 2003-06-22 2003-06-22    2004-06-09      2003-08-09 stopped 2003-06-22
#  5 2001-01-13 2001-01-23 2001-01-23    2002-01-13      2003-08-09 stopped 2001-01-23
#  6 2002-12-29 2003-03-13 2003-03-13    2003-12-29      2003-08-09 stopped 2003-03-13
#  7 2003-03-31 2003-06-23 2003-06-23    2004-03-31      2003-08-09 stopped 2003-06-23
#  8 2000-09-25 2002-11-08 2002-11-08    2002-10-17      2003-08-09 stopped 2002-11-08
#  9 2003-05-04 2003-05-04 2003-05-04    2004-05-04      2003-08-09 stopped 2003-05-04
# 10 2001-03-05 2002-03-12 2002-03-12    2002-03-05      2003-08-09 stopped 2002-03-12

在此处暂停并验证所有 usedate 值是否来自相应的列可能会很有用。

我使用usedate作为中间值有两个原因:(1)用于验证; (2) 因为你对其余部分进行相同的数学运算……所以为什么要在三个位置保持相同的数学运算,只做一次。当然,还有其他方法可以做到这一点。

mobile$tenuredate <- round(difftime(mobile$usedate, mobile$STARTDATE, units = "weeks") / 4.348125)
mobile
# # A tibble: 10 x 8
#    STARTDATE  STOPDATE   EFFECTIVEDATE CONTRACTENDDATE CUTOFFDATE status  usedate    tenuredate
#    <date>     <date>     <date>        <date>          <date>     <chr>   <date>     <drtn>    
#  1 2001-04-01 2001-04-24 2001-04-24    2002-04-01      2003-08-09 stopped 2001-04-24  1 weeks  
#  2 2001-11-13 2003-01-27 2003-01-27    2002-11-13      2003-08-09 stopped 2003-01-27 14 weeks  
#  3 2002-09-23 NA         NA            2003-09-23      2003-08-09 active  2003-08-09 11 weeks  
#  4 2003-06-09 2003-06-22 2003-06-22    2004-06-09      2003-08-09 stopped 2003-06-22  0 weeks  
#  5 2001-01-13 2001-01-23 2001-01-23    2002-01-13      2003-08-09 stopped 2001-01-23  0 weeks  
#  6 2002-12-29 2003-03-13 2003-03-13    2003-12-29      2003-08-09 stopped 2003-03-13  2 weeks  
#  7 2003-03-31 2003-06-23 2003-06-23    2004-03-31      2003-08-09 stopped 2003-06-23  3 weeks  
#  8 2000-09-25 2002-11-08 2002-11-08    2002-10-17      2003-08-09 stopped 2002-11-08 25 weeks  
#  9 2003-05-04 2003-05-04 2003-05-04    2004-05-04      2003-08-09 stopped 2003-05-04  0 weeks  
# 10 2001-03-05 2002-03-12 2002-03-12    2002-03-05      2003-08-09 stopped 2002-03-12 12 weeks  

(一旦你知道你不需要它,mobile$usedate &lt;- NULL。)


如果您使用任何 tidyverse 软件包,可以使用case_when 更简洁地完成此操作:

library(dplyr)
as_tibble(mobile) %>%
  mutate(
    usedate = case_when(
      status == "stopped"                     ~ EFFECTIVEDATE,
      (status == "active") && (difftemp >= 0) ~ CONTRACTENDDATE,
      TRUE                                    ~ CUTOFFDATE
    ),
    tenuredate = round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125)
  )

data.table 解决方案:

library(data.table)
as.data.table(mobile)[
  , usedate := Sys.Date()[NA] ][
    status == "stopped", usedate := EFFECTIVEDATE ][
      (status == "active") && (difftemp >= 0), usedate := CONTRACTENDDATE ][
        is.na(usedate), usedate := CUTOFFDATE ][
          , tenuredate := round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125) ]

如果您将data.tablemagrittr 的管道结合起来,那么您可能会发现这更具可读性:

library(data.table)
library(magrittr)
as.data.table(mobile) %>%
  .[ , usedate := Sys.Date()[NA] ] %>%
  .[ status == "stopped", usedate := EFFECTIVEDATE ] %>%
  .[ (status == "active") && (difftemp >= 0), usedate := CONTRACTENDDATE ] %>%
  .[ is.na(usedate), usedate := CUTOFFDATE ] %>%
  .[ , tenuredate := round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125) ]

参考我关于ifelse 放弃课程的断言:

【讨论】:

  • 非常感谢!这确实有助于解释,它解决了问题 =) 感谢您提供各种方法,包括改进我的代码。
猜你喜欢
  • 1970-01-01
  • 2015-09-12
  • 1970-01-01
  • 1970-01-01
  • 2013-08-31
  • 2021-04-28
  • 1970-01-01
  • 2021-12-29
  • 2020-01-14
相关资源
最近更新 更多