【问题标题】:R average of last 3 rows(values in different columns) grouping by two columns最后 3 行的 R 平均值(不同列中的值)按两列分组
【发布时间】:2018-09-08 16:33:36
【问题描述】:

DT:

HomeTeam       AwayTeam       Season      Htpoints  Atpoints
Mattersburg    Salzburg      2015/2016        3         0
Salzburg       Rapid Vienna  2015/2016        0         3
Admira         Mattersburg   2015/2016        3         0
Admira         Salzburg      2015/2016        1         1
Mattersburg    Ried          2015/2016        3         0
Ried           Salzburg      2015/2016        0         3
Altach         Mattersburg   2015/2016        3         0
Austria Vie    Mattersburg   2015/2016        3         0
Salzburg       Altach        2015/2016        3         0
Mattersburg    AC Wolfsberger2015/2016        3         0
Salzburg       Austria Vienna2015/2016        1         1
Rapid Vienna   Mattersburg   2015/2016        0         3
Sturm Graz     Salzburg      2015/2016        0         3
Salzburg       Grodig        2015/2016        3         0

计算球队最近3场主场比赛的平均分:

library(zoo)

roll <- function(x, n) { 
if (length(x) <= n) NaN 
else rollapply(x, list(-seq(n)), mean, fill = NaN)
}

transform(DT, last3.HT.av.points = ave(Htpoints,Season,HomeTeam, FUN = function(x) roll(x, 3)))

以上都不是问题。另一方面....

无论球队是主场还是客场,是否有可能计算最近3场比赛的平均分?

期望的输出(仅显示萨尔茨堡队的信息):

HomeTeam       AwayTeam       Season      Htpoints  Atpoints   HT.av.last3  AT.av.last3
Mattersburg    Salzburg      2015/2016        3         0                        NA
Salzburg       Rapid Vienna  2015/2016        0         3           NA
Admira         Mattersburg   2015/2016        3         0
Admira         Salzburg      2015/2016        1         1                        NA
Mattersburg    Ried          2015/2016        3         0
Ried           Salzburg      2015/2016        0         3                        0.33
Altach         Mattersburg   2015/2016        3         0
Austria Vie    Mattersburg   2015/2016        3         0
Salzburg       Altach        2015/2016        3         0          1.33
Mattersburg    AC Wolfsberger2015/2016        3         0
Salzburg       Austria Vienna2015/2016        1         1          2.33
Rapid Vienna   Mattersburg   2015/2016        0         3
Sturm Graz     Salzburg      2015/2016        0         3                        2.33
Salzburg       Grodig        2015/2016        3         0          2.33

偏好: 数据表

可重现的数据集(与上述不同):

 library(data.table)
 DT <- fread("HomeTeam,AwayTeam,Season,Htpoints,Atpoints
        Grodig,Salzburg,2015/2016,0,3
        Rapid Vienna,Altach,2015/2016,1,1
        Ried,Austria Vienna,2015/2016,3,0
        Sturm Graz,Mattersburg,2015/2016,3,0
        Admira,Rapid Vienna,2015/2016,1,1
        Altach,Ried,2015/2016,0,3
        Austria Vienna,Sturm Graz,2015/2016,1,1
        Mattersburg,Grodig,2015/2016,3,0
        Salzburg,AC Wolfsberger,2015/2016,3,0")

 numTeams <- DT[,uniqueN(c(HomeTeam, AwayTeam))]

 firstHalf <- lapply(seq_len(DT[,.N]),
                function(n) data.table(
                  Matchday=n*2L-1L,
                  HomeTeam=DT[["HomeTeam"]],
                  AwayTeam=c(DT[["AwayTeam"]][-seq_len(n)], DT[["AwayTeam"]][seq_len(n)]),
                  Season=DT[["Season"]],
                  Htpoints=DT[["Htpoints"]],
                  Atpoints=DT[["Atpoints"]]
                ))

 secondHalf <- lapply(seq_len(DT[,.N]),
                 function(n) data.table(
                   Matchday=n*2L,
                   HomeTeam=DT[["AwayTeam"]],
                   AwayTeam=c(DT[["HomeTeam"]][-seq_len(n)], DT[["HomeTeam"]][seq_len(n)]),
                   Season=DT[["Season"]],
                   Htpoints=DT[["Htpoints"]],
                   Atpoints=DT[["Atpoints"]]
                 ))


DT <- rbindlist(c(firstHalf, secondHalf))[
HomeTeam!=AwayTeam][,
            .SD[1L], by=.(HomeTeam, AwayTeam)]
setorder(DT, Matchday, HomeTeam)
DT <- DT[,-c("Matchday")]

【问题讨论】:

  • 你能添加一个可重现的数据集吗?
  • @Salman 添加。与所需输出的不同。不过测试一下就OK了。
  • 谢谢,但比赛都是在同一个赛季,所以3 recent matches 没有意义。你同意吗?
  • @Salman 为什么不呢?我希望这些信息知道每个团队的形式。上的例子只有一个季节。稍后我必须在真实数据集上按季节分组。

标签: r


【解决方案1】:

使用末尾注释中可重复显示的DT,添加行号列i, 并创建一个 data.table bothDT 中的每一行有两行,一个用于 主队和客队一个。然后使用rollapply 并将结果插入DT。请注意,没有必要使用特殊代码来处理团队的先前行少于 3 行的情况,因为 rollapply 会自动处理。

both <- rbind(
  DT[, list(HomeAway = "Home", Team = HomeTeam, Season, Points = Htpoints, i = .I)],
  DT[, list(HomeAway = "Away", Team = AwayTeam, Season, Points = Atpoints, i = .I)]
)

setkeyv(both, c("Season", "Team", "i"))
both[, Last3 := rollapply(Points, list(-seq(3)), mean, fill = NA_real_, na.rm = TRUE),
  by = "Season,Team"]

setkeyv(both, "i")
DT[, HtLast3 := both[HomeAway == "Home", Last3]][
   , AtLast3 := both[HomeAway == "Away", Last3]]

给予:

> DT
        HomeTeam       AwayTeam    Season Htpoints Atpoints  HtLast3   AtLast3
 1:  Mattersburg       Salzburg 2015/2016        3        0       NA        NA
 2:     Salzburg   Rapid Vienna 2015/2016        0        3       NA        NA
 3:       Admira    Mattersburg 2015/2016        3        0       NA        NA
 4:       Admira       Salzburg 2015/2016        1        1       NA        NA
 5:  Mattersburg           Ried 2015/2016        3        0       NA        NA
 6:         Ried       Salzburg 2015/2016        0        3       NA 0.3333333
 7:       Altach    Mattersburg 2015/2016        3        0       NA 2.0000000
 8:  Austria Vie    Mattersburg 2015/2016        3        0       NA 1.0000000
 9:     Salzburg         Altach 2015/2016        3        0 1.333333        NA
10:  Mattersburg AC Wolfsberger 2015/2016        3        0 1.000000        NA
11:     Salzburg Austria Vienna 2015/2016        1        1 2.333333        NA
12: Rapid Vienna    Mattersburg 2015/2016        0        3       NA 1.0000000
13:   Sturm Graz       Salzburg 2015/2016        0        3       NA 2.3333333
14:     Salzburg         Grodig 2015/2016        3        0 2.333333        NA

注意

DF <-
structure(list(HomeTeam = c("Mattersburg", "Salzburg", "Admira", 
"Admira", "Mattersburg", "Ried", "Altach", "Austria Vie", "Salzburg", 
"Mattersburg", "Salzburg", "Rapid Vienna", "Sturm Graz", "Salzburg"
), AwayTeam = c("Salzburg", "Rapid Vienna", "Mattersburg", "Salzburg", 
"Ried", "Salzburg", "Mattersburg", "Mattersburg", "Altach", "AC Wolfsberger", 
"Austria Vienna", "Mattersburg", "Salzburg", "Grodig"), Season = c("2015/2016", 
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016", 
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016", 
"2015/2016", "2015/2016", "2015/2016"), Htpoints = c(3L, 0L, 
3L, 1L, 3L, 0L, 3L, 3L, 3L, 3L, 1L, 0L, 0L, 3L), Atpoints = c(0L, 
3L, 0L, 1L, 0L, 3L, 0L, 0L, 0L, 0L, 1L, 3L, 3L, 0L)), 
class = "data.frame", row.names = c(NA, -14L))

DT <- as.data.table(DF)

【讨论】:

  • 谢谢。它完美地工作。但是我必须编写特殊代码(更好地调试我的数据库)来处理团队之前的行数少于 3 的情况。 rollapply 没有自动处理。 seq.default(start.at, NROW(data), by = by) 中的错误:'by' 参数中的错误符号。发生这种情况是因为按团队和赛季分组时,我的数据库中有特殊情况,只有 2 次出现。例如,乙级球队对阵甲级球队的晋级比赛。
  • 它会自动处理。例如,试试这个:rollapply(1:2, list(-(1:3)), mean, fill = NA)。如果有问题,问题的根源一定是别的。
  • rollapply(1:2, list(-(1:3)), mean, fill = NA) 的错误相同。 seq.default(start.at, NROW(data), by = by) 中的错误:'by' 参数中的错误符号。修复时间:rollapply(1:4, list(-(1:3)), mean, fill = NA) [1] NA NA NA 2 我正在使用 version.string R 版本 3.4.2 @G。格洛腾迪克
  • 不适合我。我对这些都没有错误。也许您使用的是旧版本的动物园。建议您将 R 和 zoo 更新到最新版本。
  • 我会做的。感谢您的回答,我能够找到另一种可能性。它有效并且没有给我带来问题(可能是由于旧版本)。 both &lt;- transform(both, Last3 = ave(Points,Season,Team, FUN = function(x) roll(x, 3))) 而不是 both[, Last3 := rollapply(Points, list(-seq(3)), mean, fill = NA_real_, na.rm = TRUE), by = "Season,Team"] 我使用了我原来的问题中定义的滚动功能。 @G。格洛腾迪克
【解决方案2】:
library(tidyverse)
library(zoo)

DT_prep <- DT %>% 
  as.tibble() %>% 
  mutate(row = row_number()) 

DT_rollmeans <- DT_prep %>% 
  gather(teamside, teamname, -Season, -Htpoints, -Atpoints, -row) %>% 
  arrange(row) %>% 
  group_by(teamname) %>% 
  mutate(points = case_when(teamside == 'HomeTeam' ~ Htpoints,
                            teamside == 'AwayTeam' ~ Atpoints),
         roll_mean = zoo::rollapply(points, 3, mean, align = 'right', fill = NA)) %>% 
  ungroup() %>% 
  select(row, teamside, roll_mean) %>%
  spread(teamside, roll_mean) %>% 
  select(row, HT.av.last3 = HomeTeam, AT.av.last3 = AwayTeam)



DT_prep %>% left_join(DT_rollmeans) %>% select(-row)

这会产生如下所示的小标题:

# A tibble: 90 x 7
   HomeTeam       AwayTeam       Season    Htpoints Atpoints HT.av.last3 AT.av.last3
   <chr>          <chr>          <chr>        <int>    <int>       <dbl>       <dbl>
 1 Admira         Ried           2015/2016        1        1          NA      NA    
 2 Altach         Sturm Graz     2015/2016        0        3          NA      NA    
 3 Austria Vienna Grodig         2015/2016        1        1          NA      NA    
 4 Grodig         Altach         2015/2016        0        3          NA      NA    
 5 Mattersburg    AC Wolfsberger 2015/2016        3        0          NA      NA    
 6 Rapid Vienna   Austria Vienna 2015/2016        1        1          NA      NA    
 7 Ried           Mattersburg    2015/2016        3        0          NA      NA    
 8 Sturm Graz     Rapid Vienna   2015/2016        3        0          NA      NA    
 9 AC Wolfsberger Grodig         2015/2016        3        0          NA       0.333
10 Mattersburg    Admira         2015/2016        3        0           2      NA    
# ... with 80 more rows

对于每个人来说,前 2 场比赛的平均值是 NA,之后是最后 3 场比赛的滚动平均值。第一支至少有三场比赛的球队是 Grodig 在数据上,它在前 3 场比赛中得分 1、0 和 0 的滚动平均值为 0.333。

我对我的解决方案不满意,但它确实有效,我相信有人可以让它更紧凑。

【讨论】:

    猜你喜欢
    • 2013-11-24
    • 1970-01-01
    • 2016-08-01
    • 1970-01-01
    • 2023-02-21
    • 1970-01-01
    • 2019-04-05
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多