【问题标题】:Add Value From One Cell to Another Based on Criteria From Different Column根据不同列的标准将一个单元格的值添加到另一个单元格
【发布时间】:2021-09-01 18:02:35
【问题描述】:

我这里有这个数据框:

smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", 
"Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", 
"P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", 
"Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", 
"Intercepted Pass", "Bad Pass", "Good Pass"), Value = c(2, 2, 
2, -3, 2, 2, 2, 2, -2, 1, -2, 2), `Score Sum` = c(2, 4, 6, 3, 
2, 4, 6, 8, 6, 1, -1, 2)), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

它包含 4 列和 12 行。第三列是根据事件分配的值。在第 4 列中,我尝试将这些值相加以获得滚动总和。因此,每当对手有一个事件时,他们的当前值将被添加到他们之前的得分总和中,对于 P1/P2/P3 类似。我已经能够将总和滚动到我期望的结果,直到第 10 行。

我这里写了以下代码:

for (i in 1:nrow(smallerDF)) {
  #print(i)
  if (smallerDF$Event[i] == "Good Pass") {
    smallerDF$Value[i] <- 2
  }
  
  if (smallerDF$Event[i] == "Bad Pass") {
    smallerDF$Value[i] <- -2
  }
  
  if (smallerDF$Event[i] == "Intercepted Pass") {
    smallerDF$Value[i] <- 1
  }
  
  if (smallerDF$Event[i] == "Turnover") {
    smallerDF$Value[i] <- -3
  }
  
  if (smallerDF$category[i] == "Opponent") {
    #print(i)
    if (i != 1 && smallerDF$category[i-1] == "Opponent") {
      smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
    }
  }
  else if (smallerDF$category[i] %in% dfList) {
    if (i != 1 && smallerDF$category[i-1] %in% dfList) {
      smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
    }
  }
}

由于我使用 [i-1],这一直有效到第 10 行,但我不知道如何让第 10 行引用回第 4 行(上次使用对手)以添加单元格 [10 ,3] 到单元格 [4,4] 上。

最终结果应该是这样的

category Event            Value `Score Sum`
   <chr>    <chr>            <dbl>       <dbl>
 1 Opponent Good Pass            2           2
 2 Opponent Good Pass            2           4
 3 Opponent Good Pass            2           6
 4 Opponent Turnover            -3           3
 5 P1       Good Pass            2           2
 6 P2       Good Pass            2           4
 7 P3       Good Pass            2           6
 8 P2       Good Pass            2           8
 9 P2       Bad Pass            -2           6
10 Opponent Intercepted Pass     1           4
11 Opponent Bad Pass            -2           2
12 P1       Good Pass            2           8

我尝试合并使用此代码

dt <- data.table(smallerDF)
newDT <- dt[ , .SD[.N] ,  by = c("category") ]

但这仅返回类别中每个不同值的最后一行,而不是类别的最新/上一次出现。

任何帮助将不胜感激。谢谢

【问题讨论】:

  • 第 12 行的总分是 8,不应该是 4 吗?
  • 来自 dput 的单元格 [10,4] 中的 1 只是我将 [10,3] 中的 1 添加到 [10,4] 中的代码。单元格 [10,4] 应该将单元格 [10,3] 中的 1 与单元格 [4,4] 中的 3 相加,以获得预期输出中的 4。这样做是在上次使用对手时添加对 [4,4] 的引用,这对我来说是一个挑战。这更有意义吗? @akrun
  • @NadPat 最终是的,它应该是 4,但我试图通过将 P1/P2/P3 加在一起来简化问题,因此 [12,3] 中的 P1 值将被添加到P2在[9,4]中的得分总和
  • @akrun 我并不完全希望将行组合在一起,因为较大的 DF 会随着时间的推移而变化,我希望随着时间的推移看到分数总和(第 4 列)的趋势进行比较P1/2/3 和对手得分

标签: r


【解决方案1】:

我认为这里的基本前提是分组计算(在for 循环中并不容易),并且应该根据category 是否为"Opponnent" 进行分组(合并"P1""P2"等)。

数据准备:仅从上述数据集的前两列开始:

smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", "Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", "P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", "Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", "Intercepted Pass", "Bad Pass", "Good Pass")), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))

我将添加一个“时间”列:一些实用程序(例如,base::merge)不遵守行顺序,尽管已经尽了最大努力。我认为无论如何都有一个“时间”组件来消除意外的重新排序通常更安全。下面的data.tabledplyr 解决方案都不会无意中重新排序,但这仍然不是一个可怕的想法。

smallerDF$time <- seq_len(nrow(smallerDF))

基础 R

这可能是三者中最不直观的,因为 R 中的分组函数看起来令人生畏。其中包括aveaggregatebytapply 等。我现在将坚持使用ave,因为它是最简单的,也许是最容易阅读的。

首先,我们将为Value 创建一个“合并/加入”表(存在其他引入这些值的方法,请参阅https://stackoverflow.com/a/68999591/3358272; @ViníciusFélix 的答案是使用case_when 实现此目的的一个很好的示例)。其次,我们将按“Opponent vs NotOpponent”进行汇总。

values <- data.frame(
  Event = c("Good Pass", "Bad Pass", "Intercepted Pass", "Turnover"),
  Value = c(2, -2, 1, -3)
)
smallerDF2 <- merge(smallerDF, values, by = "Event", all.x = TRUE, sort = FALSE)
## feel free to verify that `smallerDF2` is no longer in the original order,
## despite `sort=FALSE`. Order is not guaranteed with `base::merge`, period.
smallerDF2 <- smallerDF2[order(smallerDF2$time),]
smallerDF2
#               Event category time Value
# 1         Good Pass Opponent    1     2
# 2         Good Pass Opponent    2     2
# 3         Good Pass Opponent    3     2
# 9          Turnover Opponent    4    -3
# 5         Good Pass       P1    5     2
# 6         Good Pass       P2    6     2
# 7         Good Pass       P3    7     2
# 4         Good Pass       P2    8     2
# 10         Bad Pass       P2    9    -2
# 12 Intercepted Pass Opponent   10     1
# 11         Bad Pass Opponent   11    -2
# 8         Good Pass       P1   12     2
smallerDF2$`Score Sum2` <- ave(smallerDF2$Value, smallerDF2$category == "Opponent", FUN = cumsum)
smallerDF2
#               Event category time Value Score Sum2
# 1         Good Pass Opponent    1     2          2
# 2         Good Pass Opponent    2     2          4
# 3         Good Pass Opponent    3     2          6
# 9          Turnover Opponent    4    -3          3
# 5         Good Pass       P1    5     2          2
# 6         Good Pass       P2    6     2          4
# 7         Good Pass       P3    7     2          6
# 4         Good Pass       P2    8     2          8
# 10         Bad Pass       P2    9    -2          6
# 12 Intercepted Pass Opponent   10     1          4
# 11         Bad Pass Opponent   11    -2          2
# 8         Good Pass       P1   12     2          8

数据表

library(data.table)
smallerDT <- as.data.table(smallerDF)
smallerDT[values, Value := Value, on = .(Event)]
smallerDT[, `Score Sum2` := cumsum(Value), by = .(category == "Opponent")]

dplyr

library(dplyr)
left_join(smallerDF, values, by = "Event") %>%
  group_by(g = (category == "Opponent")) %>%
  mutate(`Score Sum` = cumsum(Value)) %>%
  ungroup() %>%
  select(-g)

【讨论】:

  • 谢谢 - 你的基本 R 版本似乎最有意义,dplyr 一直给我一个错误。但是,使用基本 R 版本,是否可以以不同的方式聚合?除了 Opponent vs Non-Opponent,我还计划根据 Opponent vs P1 vs P2 vs P3 进行求和。这是一个可能的编辑方式吗?
  • “除了” 建议多次运行,因为您不能在一个步骤中以两种方式聚合。不过,我的猜测是将ave(..., smallerDF2$category == "Opponent", ...) 更改为ave(..., smallerDF2$category, ...),看看它是否是你想要的。如果没有,请根据您的最新期望编辑您的问题。
  • 谢谢!使用 ave(..., smallerDF2$category, ...) 工作并展示了我想要的。
【解决方案2】:

这是tidyverse 解决方案

smallerDF %>% 
  #Removing original values from your data
  select(-Value,-`Score Sum`) %>% 
  #Creating Value variable with case_when
  mutate(
    Value = case_when(
      Event == "Good Pass" ~ 2,
      Event == "Bad Pass" ~ -2,
      Event == "Intercepted Pass" ~ 1,
      Event == "Turnover" ~ -3
    ),
    #Creating auxiliar logical variable (opponent or not oppponent)
    Opponent = if_else(category == "Opponent",TRUE,FALSE)
  ) %>% 
  #Creating cumulative sum by either Opponent or not oppponent
  group_by(Opponent) %>% 
  mutate(`Score sum` = cumsum(Value))

-输出

 A tibble: 12 x 4
   category Event            Value `Score Sum`
   <chr>    <chr>            <dbl>       <dbl>
 1 Opponent Good Pass            2           2
 2 Opponent Good Pass            2           4
 3 Opponent Good Pass            2           6
 4 Opponent Turnover            -3           3
 5 P1       Good Pass            2           2
 6 P2       Good Pass            2           4
 7 P3       Good Pass            2           6
 8 P2       Good Pass            2           8
 9 P2       Bad Pass            -2           6
10 Opponent Intercepted Pass     1           1
11 Opponent Bad Pass            -2          -1
12 P1       Good Pass            2           2

【讨论】:

    猜你喜欢
    • 2019-10-04
    • 2022-06-28
    • 2018-07-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多