如何将列数据与行数据匹配？答案

【问题标题】：How to match column data to row data?如何将列数据与行数据匹配？
【发布时间】：2022-01-09 14:23:28
【问题描述】：

我有一个运动数据集，内容如下：

season  team   tm   shk   dgs   brs   cts   cws  avg_pt_marg
2015    sharks shk  0-0   1-3   2-0   4-1   3-2    1.2
2015    dogs   dgs  3-1   0-0   2-1   1-1   2-0    3.4
2015    bears  brs  0-2   1-2   0-0   1-3   2-1    -0.2
2015    cats   cts  1-4   1-1   3-1   0-0   2-2    2.0
2015    cows   cws  2-3   0-2   1-2   2-2   0-0    -2.1
2014    sharks shk  0-0   1-3   2-0   4-1   3-2    0.7
2014    dogs   dgs  3-1   0-0   2-1   1-1   2-0    1.8
2014    bears  brs  0-2   1-2   0-0   1-3   2-1    -1.9
2014    cats   cts  1-4   1-1   3-1   0-0   2-2    2.3
2014    cows   cws  2-3   0-2   1-2   2-2   0-0    -3.0

我想为每一行（球队的一个赛季）添加一列，其中包含球队对手的平均分差。

这是通过将与一支球队（该赛季）的比赛场数相加乘以该球队的得分优势（该赛季）除以总比赛场数（该赛季）计算得出的。

例如，对于 2015 年 sharks,，球队对手的平均分差将是 ((4 x 3.4)+(2 x -0.2)+(5 x 2.0)+(5 x -2.1)) / 16.

如何计算该列，然后将其添加到数据框中？

像这样：


season  team   tm   shk   dgs   brs   cts   cws  avg_pt_marg opponent_marg
2015    sharks shk  0-0   1-3   2-0   4-1   3-2    1.2
2015    dogs   dgs  3-1   0-0   2-1   1-1   2-0    3.4
2015    bears  brs  0-2   1-2   0-0   1-3   2-1    -0.2
2015    cats   cts  1-4   1-1   3-1   0-0   2-2    2.0
2015    cows   cws  2-3   0-2   1-2   2-2   0-0    -2.1
2014    sharks shk  0-0   1-3   2-0   4-1   3-2    0.7
2014    dogs   dgs  3-1   0-0   2-1   1-1   2-0    1.8
2014    bears  brs  0-2   1-2   0-0   1-3   2-1    -1.9
2014    cats   cts  1-4   1-1   3-1   0-0   2-2    2.3
2014    cows   cws  2-3   0-2   1-2   2-2   0-0    -3.0

【问题讨论】：

我不遵循你的逻辑 - 为什么 2015 团队 1 ((4* 3.4)+(2 * -0.2)+(5 * 2.0)+(5 * -2.1)) / 16 - 4x、2x ... 等值从何而来？
@rg255 所以第一个学期：2015 年 team1 和 team2 打了 4 次，这就是为什么我做了 4 次队 2 的边距（3.4）。同样，在第二个任期内，2015 年 team1 与 Team 3 打了 2 次，这就是为什么我做了 2 倍于 Team 3 的保证金（-0.2）。希望这更有意义！
好的，所以每个单元格代表比赛 - 球队赢或输 - 例如因为 tm2 是 1-3，其中 tm==tm1，我们让 tm1 赢了一次，tm2 赢了 3 次？
您的数据中可能存在错误，其中 team=="team4", tm=="tm3"
哦，是的，这是一个错误。但是，是的，正确的想法。

标签： r dataframe

【解决方案1】：

嗯，不漂亮，但是

do.call(
  rbind,
  by(df,list(df$season),function(x){
    tmp=sapply(
      1:nrow(x),
      function(i){
        unlist(
          lapply(
            strsplit(
              as.character(x[i,grepl("tm[0-9]+",colnames(x))]),
              "-"
            ),
            function(y){
              sum(as.numeric(y))
            }
          )
        )
      }
    )
    cbind(
      x,
      "opponent_marg"=colSums(tmp*x[,"avg_pt_marg"])/colSums(tmp)
    )
  })
)

导致

        season  team  tm tm1 tm2 tm3 tm4 tm5 avg_pt_marg opponent_marg
2014.6    2014 team1 tm1 0-0 1-3 2-0 4-1 3-2         0.7    -0.0062500
2014.7    2014 team2 tm2 3-1 0-0 2-1 1-1 2-0         1.8    -0.3909091
2014.8    2014 team3 tm3 0-2 1-2 0-0 1-3 2-1        -1.9     0.5833333
2014.9    2014 team4 tm3 1-4 1-1 3-1 0-0 2-2         2.3    -0.8333333
2014.10   2014 team5 tm5 2-3 0-2 1-2 2-2 0-0        -3.0     0.7571429
2015.1    2015 team1 tm1 0-0 1-3 2-0 4-1 3-2         1.2     0.7937500
2015.2    2015 team2 tm2 3-1 0-0 2-1 1-1 2-0         3.4     0.3636364
2015.3    2015 team3 tm3 0-2 1-2 0-0 1-3 2-1        -0.2     1.1916667
2015.4    2015 team4 tm3 1-4 1-1 3-1 0-0 2-2         2.0     0.2400000
2015.5    2015 team5 tm5 2-3 0-2 1-2 2-2 0-0        -2.1     1.4428571

【讨论】：

感谢您的帮助！
我有一个问题。如果我的 tm1-5，实际上并没有被命名，而是像“鲨鱼”、“狗”等这样的名字。会发生什么变化？因为我看到你写了 tm[0-9]
@yoyoman32 是的，那部分是我识别 tm1 到 tm5 的地方。如果你有不同的名字，那么你将不得不找到这些列的索引并替换那部分代码。
如果你看，我更新了问题。

【解决方案2】：

另一个不是很好的解决方案，但它是一个相当复杂的小任务 - 它有很多组件。我在这里使用 data.table - 如果您不熟悉它们，它们只是 data.frame 的增强版，提供了一些额外的功能

library(data.table)
setDT(dt1)

首先，将数据重塑为更长的格式

# Reshape the data
dt2 <- dt1[, melt(.SD, id.vars=c("tm", "team", "season", "avg_pt_marg"))]

我还过滤掉了团队和对手匹配的情况。此步骤还为已玩的比赛/游戏数创建一个变量

# Filter out cases where team and opponent match
dt2 <- dt2[tm != variable,][,
  # Get number of games played
  `:=`("games_played" = as.numeric(tstrsplit(value, "-")[[1]])+
                        as.numeric(tstrsplit(value, "-")[[2]]))]

然后是给你想要的值的最后一步：

# Get the team/season averages
dt3 <- dt2[, sum(avg_pt_marg*games_played)/sum(games_played), keyby=.(season, "tm" = variable)]

您可以将其与 data.table 连接合并

dt1 <- dt1[dt3, on=c("tm", "season")]

给予：

    season   team  tm shk dgs brs cts cws avg_pt_marg         V1
 1:   2014 sharks shk 0-0 1-3 2-0 4-1 3-2         0.7 -0.0062500
 2:   2014   dogs dgs 3-1 0-0 2-1 1-1 2-0         1.8 -0.3909091
 3:   2014  bears brs 0-2 1-2 0-0 1-3 2-1        -1.9  0.5833333
 4:   2014   cats cts 1-4 1-1 3-1 0-0 2-2         2.3 -0.8333333
 5:   2014   cows cws 2-3 0-2 1-2 2-2 0-0        -3.0  0.7571429
 6:   2015 sharks shk 0-0 1-3 2-0 4-1 3-2         1.2  0.7937500
 7:   2015   dogs dgs 3-1 0-0 2-1 1-1 2-0         3.4  0.3636364
 8:   2015  bears brs 0-2 1-2 0-0 1-3 2-1        -0.2  1.1916667
 9:   2015   cats cts 1-4 1-1 3-1 0-0 2-2         2.0  0.2400000
10:   2015   cows cws 2-3 0-2 1-2 2-2 0-0        -2.1  1.4428571

【讨论】：

我要设置DT（我的数据框在这里）吗？
是的 - 这只是将 data.table 类添加到对象