【问题标题】:Creating Weighted Average and 3 Year Average values based on Year Inputs根据年份输入创建加权平均值和 3 年平均值
【发布时间】:2018-12-12 20:47:18
【问题描述】:

我希望有人能帮我想出一个解决我的问题的方法。我有一个数据框,我希望根据现有数据框添加 2 组新值。

  1. 加权平均值得出综合加权平均值,按统计数据(2014 = .4、2015 = .4 和 2016 = .2)乘以分组统计数据(HR、R、RBI、SB)
  2. 3 年平均(与上面的想法相同,但只是最近 3 年的连续 3 年平均值)

我希望在季节列下标识新数据。

这是data.frame

full_table_raw <- structure(list(playerID = c("abreujo02", "abreujo02", 
"abreujo02", "abreujo02", "abreujo02", "abreujo02", "abreujo02", 
"abreujo02", "abreujo02", "abreujo02", "abreujo02", "abreujo02", 
"arenano01", "arenano01", "arenano01", "arenano01", "arenano01", 
"arenano01", "arenano01", "arenano01", "arenano01", "arenano01", 
"arenano01", "arenano01", "blackch02", "blackch02", "blackch02", 
"blackch02", "blackch02", "blackch02", "blackch02", "blackch02", 
"blackch02", "blackch02", "blackch02", "blackch02"), season = c(2014L, 
2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 
2016L, 2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 
2016L, 2016L, 2016L, 2016L, 2014L, 2014L, 2014L, 2014L, 2015L, 
2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2016L), stat = c("HR", 
"R", "RBI", "SB", "HR", "R", "RBI", "SB", "HR", "R", "RBI", "SB", 
"HR", "R", "RBI", "SB", "HR", "R", "RBI", "SB", "HR", "R", "RBI", 
"SB", "HR", "R", "RBI", "SB", "HR", "R", "RBI", "SB", "HR", "R", 
"RBI", "SB"), points = c(3, 2, 3, 2, 2, 1, 2, 1, 1, 1, 2, 1, 
1, 1, 1, 1, 3, 3, 3, 2, 3, 3, 3, 2, 2, 3, 2, 3, 1, 2, 1, 3, 2, 
2, 1, 3), ranks = c(1, 2, 1, 2, 2, 3, 2, 3, 3, 3, 2, 3, 3, 3, 
3, 3, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 3, 2, 3, 1, 2, 2, 3, 
1), value = c(36, 80, 107, 3, 30, 88, 101, 0, 25, 67, 100, 0, 
18, 58, 61, 2, 42, 97, 130, 2, 41, 116, 133, 2, 19, 82, 72, 28, 
17, 93, 58, 43, 29, 111, 82, 17)), class = "data.frame", row.names = c(NA, 
-36L))

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    你可以这样做:

    full_table_raw %>% 
      # add a new column with the weights to apply
      mutate(weight = ifelse(season == 2016, .2, .4)) %>%
      # group_by, and then compute your averages
      group_by(stat) %>% 
      summarize(
        average = sum(value) / 3,
        weighted_average = sum(value * weight))
    

    这给了

    # A tibble: 4 x 3
      stat  weighted_average average
      <chr>            <dbl>   <dbl>
    1 HR                83.8    85.7
    2 R                258     264  
    3 RBI              275     281  
    4 SB                35.0    32.3
    

    如果您想将weighted_averageaverage 作为列添加到初始数据框中,而不是计算摘要,则只需将summarize 替换为mutate

    # A tibble: 36 x 9
    # Groups:   stat [4]
       playerID  season stat  points ranks  value weight average weighted_average
       <chr>      <int> <chr>  <dbl> <dbl>  <dbl>  <dbl>   <dbl>            <dbl>
     1 abreujo02   2014 HR      3.00  1.00  36.0   0.400    85.7             83.8
     2 abreujo02   2014 R       2.00  2.00  80.0   0.400   264              258  
     3 abreujo02   2014 RBI     3.00  1.00 107     0.400   281              275  
     4 abreujo02   2014 SB      2.00  2.00   3.00  0.400    32.3             35.0
     5 abreujo02   2015 HR      2.00  2.00  30.0   0.400    85.7             83.8
     6 abreujo02   2015 R       1.00  3.00  88.0   0.400   264              258  
     7 abreujo02   2015 RBI     2.00  2.00 101     0.400   281              275  
    ...
    

    请注意,我们可以使用最终的%&gt;% select(-weight) 来删除我们添加的列weight

    【讨论】:

    • 2 件事:1. 如果权重是 2016 = .5、2017 = .3 和 2018 = .2,您将如何修改权重?有没有办法将权重存储在单独的列表中并相应地相乘? 2. 您将如何做到这一点,以便结果绑定到现有数据框并将季节列下的结果命名为 (weighted, and 3_yr_avg)?
    • 1.使用多个嵌套的ifselseifelse(season == 2018, .2, ifelse(season == 2017, .3, ...)) 2.查看我的更新答案:可以使用mutate而不是summarize,并且可以选择新列的名称
    • 我不想添加新列,我想添加新行。我想在现有列名下绑定加权结果和 3 年平均值。绑定的新行将绑定在季节列下方并标记为“加权”或“3_yr_avg”
    • 为什么要添加新行?这似乎不可取。 season 将用于您的新行吗? stat 是什么?在新数据框中总结结果很可能是一种更好的方法。
    • full_table_raw 数据将按季节列分组,以便每个统计类别按季节只有一个对应值。例如,我希望有一个具有 HR 值的 season = 2014 的行,另一行列出 HR 的加权,应用权重以得出该统计数据的复合加权值。使用您的方法,每个列中都会列出大量重复值。
    猜你喜欢
    • 2020-11-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-11-16
    • 1970-01-01
    • 2019-12-05
    • 2021-03-07
    • 2015-04-03
    相关资源
    最近更新 更多