【问题标题】:How can I get values in new column based on other column group by columns?如何根据其他列按列分组获取新列中的值?
【发布时间】:2017-11-23 10:28:10
【问题描述】:

我正在尝试获取列desired_output,它由基于value 列组的值组成grp_1grp_2

即如果value 列中的值具有唯一值,则值应为NA's

如果值重复的次数多于任何值,则整个组将是该重复值

如果值重复相同的次数,那么整个组将是那个 MAX 数值

grp_1 = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A")
 grp_2 = c("a","a","a","a","a","b","b","b","b","c","c","c","c","d","d","d","d","e","e","e","e")
 value =c(1,2,3,3,4,1,2,3,4,1,1,2,2,1,2,4,4,1,3,3,3)
desired_output =c(3,3,3,3,3,NA,NA,NA,NA,2,2,2,2,4,4,4,4,3,3,3,3) 

 df = data.frame(grp_1,grp_2,value,desired_output)

得到重复值计数后我被打动了

func <- function(x) { 
  unlist(lapply(rle(x)$lengths, seq_len))
  
}  

df <- group_by(df,grp_1,grp_2)
df_1 <- mutate(df, common=as.numeric(func(value)) )

【问题讨论】:

    标签: r


    【解决方案1】:

    万一有人喜欢data.table

    data.table::setDT(df)
    
    df[,desired_outcome:= max(value[duplicated(value)]), by=c("grp_1","grp_2")
      ][is.infinite(desired_outcome),desired_outcome:=NA]
    

    【讨论】:

      【解决方案2】:
      library(dplyr)
      library(modeest)
      final_df <- df %>%
        group_by(grp_1,grp_2) %>%
        mutate(desired_output = ifelse(n()==length(unique(value)),
                                       NA,
                                       ifelse(length(unique(table(value)))==1,
                                              max(value),
                                              mlv(value, method='mfv')[['M']]))) %>%
        data.frame()
      final_df
      

      输出是:

         grp_1 grp_2 value desired_output
      1      A     a     1              3
      2      A     a     2              3
      3      A     a     3              3
      4      A     a     3              3
      5      A     a     4              3
      6      A     b     1             NA
      7      A     b     2             NA
      8      A     b     3             NA
      9      A     b     4             NA
      10     A     c     1              2
      11     A     c     1              2
      12     A     c     2              2
      13     A     c     2              2
      14     A     d     1              4
      15     A     d     2              4
      16     A     d     4              4
      17     A     d     4              4
      18     A     e     1              3
      19     A     e     3              3
      20     A     e     3              3
      21     A     e     3              3
      

      #sample data
      structure(list(grp_1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor"), 
          grp_2 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
          3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("a", 
          "b", "c", "d", "e"), class = "factor"), value = c(1, 2, 3, 
          3, 4, 1, 2, 3, 4, 1, 1, 2, 2, 1, 2, 4, 4, 1, 3, 3, 3)), .Names = c("grp_1", 
      "grp_2", "value"), row.names = c(NA, -21L), class = "data.frame")
      

      【讨论】:

        猜你喜欢
        • 2020-02-08
        • 2022-11-16
        • 2022-11-30
        • 1970-01-01
        • 2021-09-27
        • 1970-01-01
        • 1970-01-01
        • 2020-02-19
        • 2016-12-22
        相关资源
        最近更新 更多