【问题标题】:Creating a new column conditionally based on previous n rows根据前 n 行有条件地创建新列
【发布时间】:2020-03-16 18:01:30
【问题描述】:

我有一个如下设置的数据框:

 df <- data.frame("id" = c(111,111,111,222,222,222,222,333,333,333,333), 
                  "Location" = c("A","B","A","A","C","B","A","B","A","A","A"), 
                  "Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))

      id Location Encounter
1  111        A         1
2  111        B         2
3  111        A         3
4  222        A         1
5  222        C         2
6  222        B         3
7  222        A         4
8  333        B         1
9  333        A         2
10 333        B         3
11 333        A         4

我基本上是在尝试为每个 id 组创建一个位置在先前遭遇中的二进制标志。所以它看起来像:

    id Location Encounter Flag
1  111        A         1    0
2  111        B         2    0
3  111        A         3    1
4  222        A         1    0
5  222        C         2    0
6  222        B         3    0
7  222        A         4    1
8  333        B         1    0
9  333        A         2    0
10 333        B         3    1
11 333        A         4    1

我试图弄清楚如何执行 if 语句,例如:

library(dplyr)

df$Flag <- case_when((df$id - lag(df$id)) == 0 ~ 
                case_when(df$Location == lag(df$Location, 1) | 
                          df$Location == lag(df$Location, 2) | 
                          df$Location == lag(df$Location, 3) ~ 1, T ~ 0), T ~ 0)

    id Location Flag
1  111        A    0
2  111        B    0
3  111        A    1
4  222        A    0
5  222        C    0
6  222        B    0
7  222        A    1
8  333        B    0
9  333        A    1
10 333        B    1
11 333        A    1

但这存在第 9 行被错误地分配为 1 的问题,并且在实际数据中存在 15 次以上遭遇的情况,因此这变得非常麻烦。我希望找到一种方法来做类似的事情

lag(df$Location, 1:df$Encounter)

但我知道lag() 需要一个整数来表示 k,因此该特定命令不起作用。

【问题讨论】:

    标签: r dataframe dplyr duplicates


    【解决方案1】:

    你也可以这样用:

    library(data.table)
    setDT(df)[,flag:=ifelse(1:.N>1,1,0),by=.(id,Location)] 
    

    【讨论】:

      【解决方案2】:

      更通用的data.table 解决方案是使用.Nrowid

      library(data.table)
      
      setDT(dt)[, Flag := +(rowid(id, Location)>1)][]
      

      setDT(df)[, Flag := +(seq_len(.N)>1), .(id, Location)][]
      
      #>      id Location  Encounter Flag
      #> 1:  111        A         1    0
      #> 2:  111        B         2    0
      #> 3:  111        A         3    1
      #> 4:  222        A         1    0
      #> 5:  222        C         2    0
      #> 6:  222        B         3    0
      #> 7:  222        A         4    1
      #> 8:  333        B         1    0
      #> 9:  333        A         2    0
      #> 10: 333        A         3    1
      #> 11: 333        A         4    1
      

      【讨论】:

        【解决方案3】:

        使用data.table

        library(data.table)
        
        dt[, flag:=1]
        dt[, flag:=cumsum(flag), by=.(id,Location)]
        dt[, flag:=ifelse(flag>1,1,0)]
        

        数据:

        dt <- data.table("id" = c(111,111,111,222,222,222,222,333,333,333,333), 
                         "Location" = c("A","B","A","A","C","B","A","B","A","A","A"),
                         "Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))
        

        【讨论】:

          【解决方案4】:

          在base R中,我们可以使用aveidLocation分组,并将组第二行的所有值都变为1。

          df$Flag <- as.integer(with(df, ave(Encounter, id, Location, FUN = seq_along) > 1))
          df
          
          #    id Location Encounter Flag
          #1  111        A         1    0
          #2  111        B         2    0
          #3  111        A         3    1
          #4  222        A         1    0
          #5  222        C         2    0
          #6  222        B         3    0
          #7  222        A         4    1
          #8  333        B         1    0
          #9  333        A         2    0
          #10 333        A         3    1
          #11 333        A         4    1
          

          使用dplyr,那就是

          library(dplyr)
          
          df %>%  group_by(id, Location) %>%  mutate(Flag = as.integer(row_number() > 1))
          

          【讨论】:

            【解决方案5】:

            duplicated 的选项

            library(dplyr)
            df %>% 
              group_by(id) %>% 
              mutate(Flag = +(duplicated(Location)))
            # A tibble: 11 x 4
            # Groups:   id [3]
            #      id Location Encounter  Flag
            #   <dbl> <fct>        <dbl> <int>
            # 1   111 A                1     0
            # 2   111 B                2     0
            # 3   111 A                3     1
            # 4   222 A                1     0
            # 5   222 C                2     0
            # 6   222 B                3     0
            # 7   222 A                4     1
            # 8   333 B                1     0
            # 9   333 A                2     0
            #10   333 A                3     1
            #11   333 A                4     1
            

            【讨论】:

              猜你喜欢
              • 2016-08-16
              • 2020-10-02
              • 1970-01-01
              • 2020-09-21
              • 2020-09-12
              • 1970-01-01
              • 2021-02-28
              • 1970-01-01
              相关资源
              最近更新 更多