【问题标题】:Grouping the Data in a data frame based on conditions from more than 1 columns根据多于 1 列的条件对数据框中的数据进行分组
【发布时间】:2017-12-23 20:52:22
【问题描述】:

问题描述:

我正在尝试计算新近度,基于年份列中的最新值是多少,其中目标实现指标等于 1,如果指标列的销售员 + 年份的唯一可用值是 0键,在这种情况下选择最小年份

数据:

   Salesman_ID  Year         Yearly_Targets_Achieved_Indicator

 1     AA-5468  2012                                 1
 2     AA-5468  2013                                 0
 3     AA-5468  2014                                 0
 4     AA-5468  2015                                 0
 5     AA-5468  2016                                 1
 6     AL-3791  2012                                 1
 7     AL-3791  2013                                 1
 8     AL-3791  2014                                 0
 9     AL-3893  2015                                 0
10     AL-3893  2016                                 0

预期输出:

  Salesman_ID  Year Yearly_Targets_Achieved_Indicator
         <chr> <dbl>                             <dbl>
 1     AA-5468  2016                                 1
 2     AA-3791  2013                                 1
 9     AL-3893  2015                                 0

【问题讨论】:

    标签: r group-by dplyr


    【解决方案1】:

    使用包tidyverse建议你如下代码:

    library(tidyverse)
    
    Prashant_df <- data.frame(
        c("AA-5468","AA-5468","AA-5468","AA-5468","AA-5468","AL-3791","AL-3791","AL-3791","AL-3893","AL-3893"),
        c(2012,2013,2014,2015,2016,2012,2013,2014,2015,2016),
        c(1,0,0,0,1,1,1,0,0,0)
    )
    names(Prashant_df) <- c("Salesman_ID","Year","Yearly_Targets_Achieved_Indicator")
    
    Prashant_df <- Prashant_df %>% 
        group_by(Salesman_ID) %>% 
        mutate(Year_target=case_when(
            Yearly_Targets_Achieved_Indicator==1 ~ max(Year),
            Yearly_Targets_Achieved_Indicator==0 ~ min(Year)
            ))
    
    Prashant_df_collapsed <- Prashant_df %>% 
        group_by(Salesman_ID) %>% 
        summarise(Year=max(Year_target),
                  Yearly_Targets_Achieved_Indicator=max(Yearly_Targets_Achieved_Indicator))
    

    【讨论】:

      【解决方案2】:

      您可以存储每个销售员的最大和最小年份,以及二进制变量的最大值。

      newdf = df %>% group_by(Salesman_ID) %>% summarise(
        maximum = max(Year),
        minimum = min(Year),
        maxInd = max(Yearly_Targets_Achieved_Indicator))
      

      从这里你几乎可以构造你的结果变量。

      【讨论】:

      • library(plyr) newdf = df %>% group_by(Salesman_ID) %>% summarise(maximum = max(Year), minimum = min(Year), maxInd = max(Yearly_Targets_Achieved_Indicator) detach("包:plyr", unload=TRUE)
      【解决方案3】:

      使用基础 R:

        c(by(dat,dat[1],function(x)if(all(x[,3]==0)) x[1,2] else max(x[which(x[,3]==1),2])))
      
         AA-5468 AL-3791 AL-3893 
            2016    2013    2015 
      

      这段代码有点乱,但会产生所需的输出:解释如下:

      首先groupbysalesman_id,然后针对该特定组检查所有指标是否为零,如果是,则返回第一年。否则,在指标为 1 的年份中查找最近/最大年份

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-08-05
        • 1970-01-01
        • 2019-10-27
        • 1970-01-01
        • 1970-01-01
        • 2013-07-30
        • 2020-12-26
        • 1970-01-01
        相关资源
        最近更新 更多