【问题标题】:if statement problem within dplyr summarisedplyr 中的 if 语句问题总结
【发布时间】:2023-02-15 23:26:10
【问题描述】:

我有以下数据:

library(tidyverse)
df <- data.frame(result = c("no", "no", "no", "yes", "no", "yes"),
                 date = seq.Date(from = as.Date("01/01/1998", "%d/%m/%Y"), 
                                 to = as.Date("06/01/1998", "%d/%m/%Y"), by = "day"),
                 type = c("car", "truck", "bike", "wheel", "tyre", "lorry"))
df
#   result       date  type
# 1     no 1998-01-01   car
# 2     no 1998-01-02 truck
# 3     no 1998-01-03  bike
# 4    yes 1998-01-04 wheel
# 5     no 1998-01-05  tyre
# 6    yes 1998-01-06 lorry

我的真实示例比这更复杂,但可以说我想为 result == yes 的第一次出现提取 type 的值,以下工作:

df1 <- df %>% 
  summarise(
    type_yes = if (length(first(type[result == "yes"])) == 0)
      NA
    else first(type[result == "yes"])) 
df1
#   type_yes
# 1    wheel

如果我想创建一个变量(如果有的话)result == yes,并且想专门使用另一个if statement,则以下工作:

df1 <- df %>% 
  summarise(result = if (any(result == "yes"))
    "yes"
    else "no")
df1
#   result
# 1    yes

但是,当我将它们组合在一个调用中时,我得到了错误的结果:

df1 <- df %>% 
  summarise(result = if (any(result == "yes"))
      "yes"
    else "no",
    
    type_yes = if (length(first(type[result == "yes"])) == 0)
      NA
    else first(type[result == "yes"])) 
df1
#   result type_yes
# 1    yes      car

#when i should be obtaining
#   result type_yes
# 1    yes    wheel

有人可以解释这里发生了什么吗?

谢谢

【问题讨论】:

    标签: r if-statement dplyr


    【解决方案1】:

    当你覆盖result 在你的summarize 的第一个作业中,下一个作业看到所有result 现在都是"yes"。您可以通过在表达式中插入 browser() 并查看当前数据来查看:

    df %>% 
      summarise(
        result = if (any(result == "yes")) "yes" else "no",
        type_yes = {browser();if (length(first(type[result == "yes"])) == 0) NA else first(type[result == "yes"]);}
      )
    # Called from: mask$eval_all_summarise(quo)
    debug at #4: if (length(first(type[result == "yes"])) == 0) NA else first(type[result == 
    #     "yes"])
    cur_data()
    # # A tibble: 6 × 3
    #   result date       type 
    #   <chr>  <date>     <chr>
    # 1 yes    1998-01-01 car  
    # 2 yes    1998-01-02 truck
    # 3 yes    1998-01-03 bike 
    # 4 yes    1998-01-04 wheel
    # 5 yes    1998-01-05 tyre 
    # 6 yes    1998-01-06 lorry
    

    一种解决方法是不是替换result,而不是使用临时变量。如果您仍然需要,您可以稍后删除或过度分配。

    df %>% 
      summarise(
        res2 = if (any(result == "yes")) "yes" else "no",
        type_yes = if (length(first(type[result == "yes"])) == 0) NA else first(type[result == "yes"])
      )
    #   res2 type_yes
    # 1  yes    wheel
    

    也许清理:

    df %>% 
      summarise(
        res2 = if (any(result == "yes")) "yes" else "no",
        type_yes = if (length(first(type[result == "yes"])) == 0) NA else first(type[result == "yes"])
      ) %>%
      rename(result = res2)
    

    或者您可以更改分配的顺序:

    df %>% 
      summarise(
        type_yes = if (length(first(type[result == "yes"])) == 0) NA else first(type[result == "yes"]),
        result = if (any(result == "yes")) "yes" else "no"
      )
    #   type_yes result
    # 1    wheel    yes
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-11-19
      • 2021-12-08
      • 1970-01-01
      相关资源
      最近更新 更多