根据不同列的标准将一个单元格的值添加到另一个单元格答案

【问题标题】：Add Value From One Cell to Another Based on Criteria From Different Column根据不同列的标准将一个单元格的值添加到另一个单元格
【发布时间】：2021-09-01 18:02:35
【问题描述】：

我这里有这个数据框：

smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", 
"Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", 
"P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", 
"Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", 
"Intercepted Pass", "Bad Pass", "Good Pass"), Value = c(2, 2, 
2, -3, 2, 2, 2, 2, -2, 1, -2, 2), `Score Sum` = c(2, 4, 6, 3, 
2, 4, 6, 8, 6, 1, -1, 2)), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

它包含 4 列和 12 行。第三列是根据事件分配的值。在第 4 列中，我尝试将这些值相加以获得滚动总和。因此，每当对手有一个事件时，他们的当前值将被添加到他们之前的得分总和中，对于 P1/P2/P3 类似。我已经能够将总和滚动到我期望的结果，直到第 10 行。

我这里写了以下代码：

for (i in 1:nrow(smallerDF)) {
  #print(i)
  if (smallerDF$Event[i] == "Good Pass") {
    smallerDF$Value[i] <- 2
  }
  
  if (smallerDF$Event[i] == "Bad Pass") {
    smallerDF$Value[i] <- -2
  }
  
  if (smallerDF$Event[i] == "Intercepted Pass") {
    smallerDF$Value[i] <- 1
  }
  
  if (smallerDF$Event[i] == "Turnover") {
    smallerDF$Value[i] <- -3
  }
  
  if (smallerDF$category[i] == "Opponent") {
    #print(i)
    if (i != 1 && smallerDF$category[i-1] == "Opponent") {
      smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
    }
  }
  else if (smallerDF$category[i] %in% dfList) {
    if (i != 1 && smallerDF$category[i-1] %in% dfList) {
      smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
    }
  }
}

由于我使用 [i-1]，这一直有效到第 10 行，但我不知道如何让第 10 行引用回第 4 行（上次使用对手）以添加单元格 [10 ,3] 到单元格 [4,4] 上。

最终结果应该是这样的

category Event            Value `Score Sum`
   <chr>    <chr>            <dbl>       <dbl>
 1 Opponent Good Pass            2           2
 2 Opponent Good Pass            2           4
 3 Opponent Good Pass            2           6
 4 Opponent Turnover            -3           3
 5 P1       Good Pass            2           2
 6 P2       Good Pass            2           4
 7 P3       Good Pass            2           6
 8 P2       Good Pass            2           8
 9 P2       Bad Pass            -2           6
10 Opponent Intercepted Pass     1           4
11 Opponent Bad Pass            -2           2
12 P1       Good Pass            2           8

我尝试合并使用此代码

dt <- data.table(smallerDF)
newDT <- dt[ , .SD[.N] ,  by = c("category") ]

但这仅返回类别中每个不同值的最后一行，而不是类别的最新/上一次出现。

任何帮助将不胜感激。谢谢

【问题讨论】：

第 12 行的总分是 8，不应该是 4 吗？
来自 dput 的单元格 [10,4] 中的 1 只是我将 [10,3] 中的 1 添加到 [10,4] 中的代码。单元格 [10,4] 应该将单元格 [10,3] 中的 1 与单元格 [4,4] 中的 3 相加，以获得预期输出中的 4。这样做是在上次使用对手时添加对 [4,4] 的引用，这对我来说是一个挑战。这更有意义吗？ @akrun
@NadPat 最终是的，它应该是 4，但我试图通过将 P1/P2/P3 加在一起来简化问题，因此 [12,3] 中的 P1 值将被添加到P2在[9,4]中的得分总和
@akrun 我并不完全希望将行组合在一起，因为较大的 DF 会随着时间的推移而变化，我希望随着时间的推移看到分数总和（第 4 列）的趋势进行比较P1/2/3 和对手得分

标签： r

【解决方案1】：

我认为这里的基本前提是分组计算（在for 循环中并不容易），并且应该根据category 是否为"Opponnent" 进行分组（合并"P1"，"P2"等）。

数据准备：仅从上述数据集的前两列开始：

smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", "Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", "P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", "Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", "Intercepted Pass", "Bad Pass", "Good Pass")), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))

我将添加一个“时间”列：一些实用程序（例如，base::merge）不遵守行顺序，尽管已经尽了最大努力。我认为无论如何都有一个“时间”组件来消除意外的重新排序通常更安全。下面的data.table 和dplyr 解决方案都不会无意中重新排序，但这仍然不是一个可怕的想法。

smallerDF$time <- seq_len(nrow(smallerDF))

基础 R

这可能是三者中最不直观的，因为 R 中的分组函数看起来令人生畏。其中包括ave、aggregate、by、tapply 等。我现在将坚持使用ave，因为它是最简单的，也许是最容易阅读的。

首先，我们将为Value 创建一个“合并/加入”表（存在其他引入这些值的方法，请参阅https://stackoverflow.com/a/68999591/3358272; @ViníciusFélix 的答案是使用case_when 实现此目的的一个很好的示例）。其次，我们将按“Opponent vs NotOpponent”进行汇总。

values <- data.frame(
  Event = c("Good Pass", "Bad Pass", "Intercepted Pass", "Turnover"),
  Value = c(2, -2, 1, -3)
)
smallerDF2 <- merge(smallerDF, values, by = "Event", all.x = TRUE, sort = FALSE)
## feel free to verify that `smallerDF2` is no longer in the original order,
## despite `sort=FALSE`. Order is not guaranteed with `base::merge`, period.
smallerDF2 <- smallerDF2[order(smallerDF2$time),]
smallerDF2
#               Event category time Value
# 1         Good Pass Opponent    1     2
# 2         Good Pass Opponent    2     2
# 3         Good Pass Opponent    3     2
# 9          Turnover Opponent    4    -3
# 5         Good Pass       P1    5     2
# 6         Good Pass       P2    6     2
# 7         Good Pass       P3    7     2
# 4         Good Pass       P2    8     2
# 10         Bad Pass       P2    9    -2
# 12 Intercepted Pass Opponent   10     1
# 11         Bad Pass Opponent   11    -2
# 8         Good Pass       P1   12     2
smallerDF2$`Score Sum2` <- ave(smallerDF2$Value, smallerDF2$category == "Opponent", FUN = cumsum)
smallerDF2
#               Event category time Value Score Sum2
# 1         Good Pass Opponent    1     2          2
# 2         Good Pass Opponent    2     2          4
# 3         Good Pass Opponent    3     2          6
# 9          Turnover Opponent    4    -3          3
# 5         Good Pass       P1    5     2          2
# 6         Good Pass       P2    6     2          4
# 7         Good Pass       P3    7     2          6
# 4         Good Pass       P2    8     2          8
# 10         Bad Pass       P2    9    -2          6
# 12 Intercepted Pass Opponent   10     1          4
# 11         Bad Pass Opponent   11    -2          2
# 8         Good Pass       P1   12     2          8

数据表

library(data.table)
smallerDT <- as.data.table(smallerDF)
smallerDT[values, Value := Value, on = .(Event)]
smallerDT[, `Score Sum2` := cumsum(Value), by = .(category == "Opponent")]

dplyr

library(dplyr)
left_join(smallerDF, values, by = "Event") %>%
  group_by(g = (category == "Opponent")) %>%
  mutate(`Score Sum` = cumsum(Value)) %>%
  ungroup() %>%
  select(-g)

【讨论】：

谢谢 - 你的基本 R 版本似乎最有意义，dplyr 一直给我一个错误。但是，使用基本 R 版本，是否可以以不同的方式聚合？除了 Opponent vs Non-Opponent，我还计划根据 Opponent vs P1 vs P2 vs P3 进行求和。这是一个可能的编辑方式吗？
“除了” 建议多次运行，因为您不能在一个步骤中以两种方式聚合。不过，我的猜测是将ave(..., smallerDF2$category == "Opponent", ...) 更改为ave(..., smallerDF2$category, ...)，看看它是否是你想要的。如果没有，请根据您的最新期望编辑您的问题。
谢谢！使用 ave(..., smallerDF2$category, ...) 工作并展示了我想要的。

【解决方案2】：

这是tidyverse 解决方案

smallerDF %>% 
  #Removing original values from your data
  select(-Value,-`Score Sum`) %>% 
  #Creating Value variable with case_when
  mutate(
    Value = case_when(
      Event == "Good Pass" ~ 2,
      Event == "Bad Pass" ~ -2,
      Event == "Intercepted Pass" ~ 1,
      Event == "Turnover" ~ -3
    ),
    #Creating auxiliar logical variable (opponent or not oppponent)
    Opponent = if_else(category == "Opponent",TRUE,FALSE)
  ) %>% 
  #Creating cumulative sum by either Opponent or not oppponent
  group_by(Opponent) %>% 
  mutate(`Score sum` = cumsum(Value))

-输出

 A tibble: 12 x 4
   category Event            Value `Score Sum`
   <chr>    <chr>            <dbl>       <dbl>
 1 Opponent Good Pass            2           2
 2 Opponent Good Pass            2           4
 3 Opponent Good Pass            2           6
 4 Opponent Turnover            -3           3
 5 P1       Good Pass            2           2
 6 P2       Good Pass            2           4
 7 P3       Good Pass            2           6
 8 P2       Good Pass            2           8
 9 P2       Bad Pass            -2           6
10 Opponent Intercepted Pass     1           1
11 Opponent Bad Pass            -2          -1
12 P1       Good Pass            2           2

【讨论】：