【问题标题】:Flagging an id when having similar columns different values in R在 R 中具有相似列不同值时标记 id
【发布时间】:2022-01-11 21:10:11
【问题描述】:

grade 列中有不同的grade 值时,我需要标记id。这是我的示例数据集的样子

df <- data.frame(id = c(11,22,33,44,55),
                 grade.1 = c(3,4,5,6,7),
                 grade.2 = c(3,4,5,NA,7),
                 grade.3 = c(4,4,6,5,7),
                 grade.4 = c(NA,NA,NA, 5, 7 ))

df$Grade <- paste0(df$grade.1, df$grade.2, df$grade.3, df$grade.4)

> df
  id grade.1 grade.2 grade.3 grade.4 Grade
1 11       3       3       4      NA 334NA
2 22       4       4       4      NA 444NA
3 33       5       5       6      NA 556NA
4 44       6      NA       5       5 6NA55
5 55       7       7       7       7  7777

idgrade.1 grade.2 grade.3grade.4 中具有不同的等级值时,需要标记该行。在该列中包含NA 不会影响标记。

换句话说,如果末尾的Grade 列有任何差异数字,则需要标记id

我想要的输出应该是这样的:

> df
  id grade.1 grade.2 grade.3 grade.4        flag
1 11       3       3       4      NA     flagged
2 22       4       4       4      NA Not_flagged
3 33       5       5       6      NA     flagged
4 44       6      NA       5       5     flagged
5 55       7       7       7       7 Not_flagged

有什么想法吗? 谢谢!

【问题讨论】:

    标签: r filter flags


    【解决方案1】:

    使用rle 省略NA 值的base R 解决方案。

    df$flag <- apply(df[,2:5], 1, function(x) 
      ifelse(length(rle(x[!is.na(x)])$lengths)==1, "not_flagged", "flagged"))
    
    df
      id grade.1 grade.2 grade.3 grade.4        flag
    1 11       3       3       4      NA     flagged
    2 22       4       4       4      NA not_flagged
    3 33       5       5       6      NA     flagged
    4 44       6      NA       5       5     flagged
    5 55       7       7       7       7 not_flagged
    

    数据

    df <- structure(list(id = c(11, 22, 33, 44, 55), grade.1 = c(3, 4, 
    5, 6, 7), grade.2 = c(3, 4, 5, NA, 7), grade.3 = c(4, 4, 6, 5, 
    7), grade.4 = c(NA, NA, NA, 5, 7)), class = "data.frame", row.names = c(NA, 
    -5L))
    

    【讨论】:

    • 感谢您的解决方案!
    【解决方案2】:

    这是一个基本的 R 方法。

    df$flag <- c("not_flagged", "flagged")[
      apply(df[-1L], 1L, \(x) length( (ux <- unique(x))[!is.na(ux)] ) > 1L) + 1L
    ]
    

    输出

    > df
      id grade.1 grade.2 grade.3 grade.4        flag
    1 11       3       3       4      NA     flagged
    2 22       4       4       4      NA not_flagged
    3 33       5       5       6      NA     flagged
    4 44       6      NA       5       5     flagged
    5 55       7       7       7       7 not_flagged
    

    【讨论】:

    • 感谢您的解决方案!
    【解决方案3】:

    一个可能的解决方案:

    library(tidyverse)
    
    df <- data.frame(id = c(11,22,33,44,55),
                     grade.1 = c(3,4,5,6,7),
                     grade.2 = c(3,4,5,NA,7),
                     grade.3 = c(4,4,6,5,7),
                     grade.4 = c(NA,NA,NA, 5, 7 ))
    
    df %>% 
      rowwise %>% 
      mutate(flag = if_else(length(unique(na.omit(c_across(2:5))))  == 1,
                            "not-flagged", "flagged")) %>% ungroup
    
    #> # A tibble: 5 × 6
    #>      id grade.1 grade.2 grade.3 grade.4 flag       
    #>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>      
    #> 1    11       3       3       4      NA flagged    
    #> 2    22       4       4       4      NA not-flagged
    #> 3    33       5       5       6      NA flagged    
    #> 4    44       6      NA       5       5 flagged    
    #> 5    55       7       7       7       7 not-flagged
    

    使用data.table::uniqueN,计算向量中唯一元素的数量(并允许删除NA):

    library(data.table)
    library(dplyr)
    
    df %>% 
      rowwise %>% 
      mutate(flag = if_else(uniqueN(c_across(2:5), na.rm = T)  == 1,
                            "not-flagged", "flagged")) %>% ungroup
    

    【讨论】:

    • 感谢您的解决方案!
    • 不客气,@amisos55!
    【解决方案4】:

    来自dyplrn_distinct 非常有帮助:这里是使用pivot_longerpivot_wider 组合的版本:

    library(dplyr)
    library(tidyr)
    
    df %>% 
      pivot_longer(
        -c(id, Grade),
        names_to = "name",
        values_to = "value"
      ) %>% 
      group_by(id) %>% 
      mutate(flag = ifelse(n_distinct(value, na.rm = TRUE)==1, "Not flagged", "Flagged")) %>% 
      pivot_wider(
        names_from = name,
        values_from = value
      )
    
         id Grade flag        grade.1 grade.2 grade.3 grade.4
      <dbl> <chr> <chr>         <dbl>   <dbl>   <dbl>   <dbl>
    1    11 334NA Flagged           3       3       4      NA
    2    22 444NA Not flagged       4       4       4      NA
    3    33 556NA Flagged           5       5       6      NA
    4    44 6NA55 Flagged           6      NA       5       5
    5    55 7777  Not flagged       7       7       7       7
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-08-09
      • 1970-01-01
      • 2012-03-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-04-23
      • 2021-03-18
      相关资源
      最近更新 更多