【问题标题】:Creating a new column in my data frame based on a function基于函数在我的数据框中创建一个新列
【发布时间】:2026-01-17 20:35:01
【问题描述】:

我有一个数据框,其中包含 NFL 球队和一些关于他们的数据。我想为那一周的每支球队增加每场比赛的积分。 我不能只按团队汇总数据,因为我需要个人游戏目前的表现方式。

    CurrYrfun <- function(Yr,Tm,Wk){
  PPG <- Schedule_Results %>% 
    filter(Year == Yr & Team == Tm & Week < Wk) %>% 
    group_by(Team) %>% 
    summarize(APG = mean(Pts))
  return(PPG[['APG']])
}

此函数为单个记录提供正确的结果,但是当我尝试改变数据框中的新列时,如下所示:

    Schedule_Results <- Schedule_Results %>% 
  mutate(PPG = CurrYrfun(Year, Team, Week))

我收到一条错误消息,提示 PPG 的长度为 0。我已尝试附上数据框的图片,因此您对我正在使用的数据有所了解。dataframe snapshot here

已编辑以包含数据和示例:

Schedule_Results <- structure(list(Year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 
 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 
 2019L, 2019L, 2019L, 2019L, 2019L, 2019L), Week = c(17, 17, 17, 
 16, 16, 16, 15, 15, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 11, 
 11, 11), Team = c("Washington Redskins", "Cincinnati Bengals", 
 "Jacksonville Jaguars", "Jacksonville Jaguars", "Washington Redskins", 
 "Cincinnati Bengals", "Cincinnati Bengals", "Washington Redskins", 
 "Jacksonville Jaguars", "Washington Redskins", "Cincinnati Bengals", 
 "Jacksonville Jaguars", "Jacksonville Jaguars", "Washington Redskins", 
 "Cincinnati Bengals", "Cincinnati Bengals", "Jacksonville Jaguars", 
 "Washington Redskins", "Washington Redskins", "Jacksonville Jaguars", 
 "Cincinnati Bengals"), Opp = c("Dallas Cowboys", "Cleveland Browns", 
 "Indianapolis Colts", "Atlanta Falcons", "New York Giants", "Miami Dolphins", 
 "New England Patriots", "Philadelphia Eagles", "Oakland Raiders", 
 "Green Bay Packers", "Cleveland Browns", "Los Angeles Chargers", 
 "Tampa Bay Buccaneers", "Carolina Panthers", "New York Jets", 
 "Pittsburgh Steelers", "Tennessee Titans", "Detroit Lions", "New York Jets", 
 "Indianapolis Colts", "Oakland Raiders"), Pts = c(16, 33, 38, 
 12, 35, 35, 13, 27, 20, 15, 19, 10, 11, 29, 22, 10, 20, 19, 17, 
 13, 10), Opp_Pts = c(47, 23, 20, 24, 41, 38, 34, 37, 16, 20, 
 27, 45, 28, 21, 6, 16, 42, 16, 34, 33, 17), Yds = c(271, 361, 
 353, 288, 361, 430, 315, 352, 262, 262, 451, 252, 242, 362, 277, 
 244, 369, 230, 225, 308, 246), Opp_Yds = c(517, 313, 275, 518, 
 552, 502, 291, 415, 364, 341, 333, 525, 315, 278, 271, 338, 471, 
 364, 400, 389, 386), TO = c(2, 1, 1, 1, 0, 1, 5, 1, 0, 1, 1, 
 0, 4, 0, 0, 2, 1, 2, 1, 1, 2), Opp_TO = c(1, 3, 2, 2, 0, 1, 0, 
 1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 4, 2, 2, 2), Home = c("1", "1", 
 "1", "1", "0", "1", "0", "0", "0", "1", "1", "0", "0", "0", "1", 
 "0", "1", "1", "0", "1", "1"), Playoffs = c(0, 0, 0, 0, 0, 0, 
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), win = c("0", "1", 
 "1", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "1", "1", 
 "0", "0", "1", "0", "0", "0")), row.names = c(NA, -21L), class = "data.frame")
CurrYrfun <- function(Yr,Tm,Wk){
  PPG <- Schedule_Results %>% 
    filter(Year == Yr & Team == Tm & Week < Wk) %>% 
    group_by(Team) %>% 
    summarize(APG = mean(Pts))
  return(PPG[['APG']])
}

CurrYrfun(2019,'Washington Redskins',13)
CurrYrfun(2019,'Jacksonville Jaguars',14)
CurrYrfun(2019,'Washington Redskins',16)
CurrYrfun(2019,'Cincinnati Bengals',15)

Schedule_Results <- Schedule_Results %>% 
  mutate(PPG = CurrYrfun(Year, Team, Week))

我的目标是将每一行的函数输出作为数据框中的新列返回

【问题讨论】:

  • 你不能只使用 mutate 而不是 sumarize 吗?
  • 如果您共享示例输入和所需输出,我们可能会帮助您调试。请使用dput() 共享样本输入,例如dput(Schedule_Results[1:10, ])-- 或其他一些合适的子集,如果前10 行不是一个好的选择。处理数据图片非常困难....
  • 你应该了解dplyr中的函数是如何工作的:尝试阅读整个programming with dplyr
  • @Onyambu 是的,我相信我的问题是,当我的函数被添加到第二个代码块时,它没有根据需要将我的列名作为输入。我该如何解决?
  • @GregorThomas 我已经更新了我的问题以包含这些项目。很抱歉一开始没有说清楚,这是我第一次发帖

标签: r function dplyr


【解决方案1】:

我很确定这就是你想要的。我抽查了您提供的前几个示例,它们看起来很正确。

 Schedule_Results %>%
   group_by(Team, Year) %>%
   arrange(Week) %>%
   mutate(PPG = lag(cummean(Pts), 1))
# # A tibble: 21 x 14
# # Groups:   Team, Year [3]
#     Year  Week Team             Opp                Pts Opp_Pts   Yds Opp_Yds    TO Opp_TO Home  Playoffs win     PPG
#    <int> <dbl> <chr>            <chr>            <dbl>   <dbl> <dbl>   <dbl> <dbl>  <dbl> <chr>    <dbl> <chr> <dbl>
#  1  2019    11 Washington Reds~ New York Jets       17      34   225     400     1      2 0            0 0      NA  
#  2  2019    11 Jacksonville Ja~ Indianapolis Co~    13      33   308     389     1      2 1            0 0      NA  
#  3  2019    11 Cincinnati Beng~ Oakland Raiders     10      17   246     386     2      2 1            0 0      NA  
#  4  2019    12 Cincinnati Beng~ Pittsburgh Stee~    10      16   244     338     2      1 0            0 0      10  
#  5  2019    12 Jacksonville Ja~ Tennessee Titans    20      42   369     471     1      2 1            0 0      13  
#  6  2019    12 Washington Reds~ Detroit Lions       19      16   230     364     2      4 1            0 1      17  
#  7  2019    13 Jacksonville Ja~ Tampa Bay Bucca~    11      28   242     315     4      1 0            0 0      16.5
#  8  2019    13 Washington Reds~ Carolina Panthe~    29      21   362     278     0      2 0            0 1      18  
#  9  2019    13 Cincinnati Beng~ New York Jets       22       6   277     271     0      0 1            0 1      10  
# 10  2019    14 Washington Reds~ Green Bay Packe~    15      20   262     341     1      1 1            0 0      21.7
...

【讨论】:

  • data.table 版本为setDT(Schedule_Results)[order(Week), PPG := shift(cummean(Pts)), .(Team, Year)]