【问题标题】:r for loop with if else statement and reference to outcome of previous iterationr for 带有 if else 语句的循环和对先前迭代结果的引用
【发布时间】:2019-12-29 22:14:05
【问题描述】:

我有一个数据框,其中字段 x 包含组名(在下面的示例中标记为字母)和组成员(列在组名下,标记为数字)。我想创建一个字段,为每个成员显示其组的名称。在下面的数据框中,所需的输出显示在“结果”列中。

df <- data.frame("x"=c("A","1","2","B","C","1","2","C","D","1"),
                 "outcome"=c("A","A","A","B","C","C","C","C","D","D")
) %>%
  mutate(
    Letter = ifelse(grepl("[A-Za-z]", x) == T,"Letter",
                      "No Letter")
  )

我的想法是通过 forloop 来做到这一点。如果 x 是一个字母,它应该返回那个字母,如果不是,它应该返回前一个循环的结果(这是在 x 中找到的前一个字母)。 下面的 forloop 没有给出正确的输出:

df$outcome_calc[1] <- "A" 
for (i in 2:10) {  
  df$outcome_calc[i] <- ifelse(df$Letter[i] == "No Letter",df$outcome_calc[i-1],df$x[i])    

}

任何想法如何获得正确的输出?

【问题讨论】:

    标签: r for-loop


    【解决方案1】:

    这里有两种tidyverse 方式,非常相似,使用便利功能zoo::na.locf

    第一:

    library(tidyverse)
    
    df %>%
      mutate(na = is.na(as.numeric(as.character(x))),
             outcome2 = ifelse(na, as.character(x), NA_character_),
             outcome2 = zoo::na.locf(outcome2)) %>%
      select(-na)
    

    另一个:

    df %>%
      mutate(chr = !grepl("[[:digit:]]", x),
             outcome2 = ifelse(chr, as.character(x), NA_character_),
             outcome2 = zoo::na.locf(outcome2)) %>%
      select(-chr)
    

    【讨论】:

      【解决方案2】:

      这是一种使用for 循环的方法:

      # keeps track of previous letter
      prev = ''
      
      # output
      op = c()
      
      for (i in df$x){
      
          # check the pattern
          check = grepl(pattern = '[a-zA-Z]', x = i, ignore.case = T)
      
          if(isTRUE(check)){
              op = c(op, i)
              prev = i
          } else {
              op = c(op, prev)
          }
      
      }
      
      print(op)
      [1] "A" "A" "A" "B" "C" "C" "C" "C" "D" "D"
      

      【讨论】:

      • 谢谢YOLO。虽然在大型数据集中它可能不是最有效的,但它对我来说是最易读的。
      【解决方案3】:

      或者,您可以使用sapply 函数来避免for 循环。

      您可以定义字母的位置:

      pos_letter <- grep("[A-Za-z]", df$x)
      

      然后,使用sapply 1)为每一行定义正上方字母的位置,最后将每个值替换为相应的字母:

      df$out <- sapply(1:nrow(df),function(x) max(pos_letter[pos_letter <= x]))
      df$out2 <- sapply(df$out, function(x) x = as.character(df[x,"x"]))
      
         x outcome out out2
      1  A       A   1    A
      2  1       A   1    A
      3  2       A   1    A
      4  B       B   4    B
      5  C       C   5    C
      6  1       C   5    C
      7  2       C   5    C
      8  C       C   8    C
      9  D       D   9    D
      10 1       D   9    D
      

      您可以通过编写将sapply 函数组合在一行中:

      sapply(1:nrow(df), function(n) as.character(df[max(pos_letter[pos_letter <= n]),"x"]))
      
      [1] "A" "A" "A" "B" "C" "C" "C" "C" "D" "D"
      

      【讨论】:

        【解决方案4】:

        使用 tidyr::fill - 需要您的号码所在的 NA:

        df = data.frame(x = c("A","1","2","B","C","1","2","C","D","1"),
                        stringsAsFactors = FALSE)
        
        df$x[grepl("[0-9]+", df$x)] = NA
        
        tidyr::fill(df, x)
           x
        1  A
        2  A
        3  A
        4  B
        5  C
        6  C
        7  C
        8  C
        9  D
        10 D
        

        【讨论】:

          【解决方案5】:

          dplyr

          这是Rui's 2nd approach 的简化版本,不需要创建临时帮助列。它使用stringr::str_detect()if_else()zoo::na.locf()

          library(dplyr)
          df %>% 
            mutate(outcome2 = if_else(stringr::str_detect(x, "\\D"), x, factor(NA)) %>% zoo::na.locf())
          
             x outcome    Letter outcome2
          1  A       A    Letter        A
          2  1       A No Letter        A
          3  2       A No Letter        A
          4  B       B    Letter        B
          5  C       C    Letter        C
          6  1       C No Letter        C
          7  2       C No Letter        C
          8  C       C    Letter        C
          9  D       D    Letter        D
          10 1       D No Letter        D
          

          data.table

          为了完整起见,这里也是我经常使用的data.table 方法。它使用引用赋值来更新df

          library(data.table)
          setDT(df)[x %like% "\\D", outcome2 := x][, outcome2 := zoo::na.locf(outcome2)][]
          
              x outcome    Letter outcome2
           1: A       A    Letter        A
           2: 1       A No Letter        A
           3: 2       A No Letter        A
           4: B       B    Letter        B
           5: C       C    Letter        C
           6: 1       C No Letter        C
           7: 2       C No Letter        C
           8: C       C    Letter        C
           9: D       D    Letter        D
          10: 1       D No Letter        D
          

          【讨论】:

            猜你喜欢
            • 2020-01-15
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2019-12-12
            相关资源
            最近更新 更多