R中的条件滚动和循环答案

【问题标题】：Conditional rolling sum loop in RR中的条件滚动和循环
【发布时间】：2020-08-24 09:45:46
【问题描述】：

我正在寻找某种有条件的滚动求和，我认为 while 循环可以满足我的需要，但我在实现它时遇到了麻烦。所以这应该看起来像 PCAR[1]*time[1]+PCAR[2]*time[2]+PCAR[3]*time[3] 等，其中 [] 引用列的行，这将循环直到累积时间值达到

我希望这是有道理的。在 PCAR_BIN 列下方的示例数据中，我的目标是作为结果。

df <- tibble(cumulative.time = c(20,40,60,80,100, 120,140,160,180,200), 
             PCAR =1:10,
             time = 1:10,
             depth.along.core = 1:10, 
             Age.cal.BP = 1:10, 
             AFBD = 1:10, 
             assumed.C = rep(0.5, 10),
             PCAR_BIN = c(55,330,NA,NA,NA,NA,NA,NA,NA,NA))

函数看起来像

MBA <- function(data) {
  require(dplyr)
  data %>% mutate(PCAR=((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
                  PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP), 
                  PCA_NCP[is.na(PCA_NCP)] <- 0, 
                  CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
                  CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up,
                  PCAR_BIN = ifelse(cumulative.time <= 100, sum(PCAR*time+lead(PCAR)*lead(time),NA)
                    )}

显然，我对 ifelse satement 没有运气，因为它只能在一次迭代中起作用，而且总和是错误的。我尝试过类似的 while 和 for 循环，但没有运气。部分问题是我不确定如何表达我需要的总和。我也尝试过使用 case_when 对数据进行分箱，并解决了这个问题，但再次没有运气。

谢谢大家:)

编辑

按照 Martins 方法，我现在具有创建 ROLLSUM 列的功能，我现在需要创建一个列，该列将为每个世纪组提供最大值。从 slicemax 开始运行代码会给我错误： eval 中的错误（lhs，父级，父级）：找不到对象'tmp'

我也添加了真实数据。

dput(head(EMC))
structure(list(depth.along.core = c(0.5, 1.5, 2.5, 3.5, 4.5, 
5.5), Age.cal.BP = c(-56.016347625, -55.075825875, -54.201453125, 
-53.365755375, -52.541258625, -51.700488875), time = c(0.94052175, 
0.87437275, 0.83569775, 0.82449675, 0.84076975, 0.88451675), 
    cumulative.time = c(0.94052175, 1.8148945, 2.65059225, 3.475089, 
    4.31585875, 5.2003755), AFBD = c(0.0711, 0.057, 0.0568, 0.0512, 
    0.0559, 0.0353), assumed.C = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5
    )), row.names = c(NA, 6L), class = "data.frame")

MBA <- function(data) {
  require(dplyr)
  data %>% mutate(PCAR=((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
                  PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP), 
                  PCA_NCP[is.na(PCA_NCP)] <- 0, 
                  CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
                  CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up)%>%
                  slice(1:(n()-1))%>%
                  group_by(Century = cut(cumulative.time, breaks = seq(0, max(cumulative.time), 100)))%>%
                  mutate(ROLLSUM = rev(cumsum(PCAR*time)))%>%
                  slice_max(order_by = ROLLSUM, n = 1) %>%
                  pull(ROLLSUM)%>%
                  df$ROLLSUM <- c(groupMaxima, rep(NA, nrow(df) - length(groupMaxima)))}

【问题讨论】：

标签： r

【解决方案1】：

你可以试试这个：

# Get cumulative sums by group (assuming per century groups)
df <- df %>% 
  group_by(Century = cut(cumulative.time, 
                         breaks = seq(0, max(cumulative.time), 100))) %>%
  mutate(ROLLSUM = rev(cumsum(PCAR * time)))

# Get maximum of each group
groupMaxima <- df %>%
  slice_max(order_by = ROLLSUM, n = 1) %>%
  pull(ROLLSUM)

# Fill column as desired
df$ROLLSUM <- c(groupMaxima, rep(NA, nrow(df) - length(groupMaxima)))

我们只需创建一个因子列以按世纪对累积时间列进行分组，然后使用该因子对值求和。最后，我们编辑滚动总和列以仅包含最大值并用NA 填充其他行。

# A tibble: 10 x 10
# Groups:   Group [2]
   cumulative.time  PCAR  time depth.along.core Age.cal.BP  AFBD assumed.C PCAR_BIN Group     ROLLSUM
             <dbl> <int> <int>            <int>      <int> <int>     <dbl>    <dbl> <fct>       <int>
 1              20     1     1                1          1     1       0.5       55 (0,100]        55
 2              40     2     2                2          2     2       0.5      330 (0,100]       330
 3              60     3     3                3          3     3       0.5       NA (0,100]        NA
 4              80     4     4                4          4     4       0.5       NA (0,100]        NA
 5             100     5     5                5          5     5       0.5       NA (0,100]        NA
 6             120     6     6                6          6     6       0.5       NA (100,200]      NA
 7             140     7     7                7          7     7       0.5       NA (100,200]      NA
 8             160     8     8                8          8     8       0.5       NA (100,200]      NA
 9             180     9     9                9          9     9       0.5       NA (100,200]      NA
10             200    10    10               10         10    10       0.5       NA (100,200]      NA

编辑：

对于这种特殊情况：

MBA <- function(data) {
  require(dplyr)
  data <- data %>% mutate(PCAR = ((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
                  PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP), 
                  PCA_NCP[is.na(PCA_NCP)] <- 0, 
                  CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
                  CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up) 
  
  data <- data %>%
    group_by(CTIME = cut(cumsum(cumulative.time), 
                         breaks = seq(0, max(cumsum(cumulative.time), na.rm = T), 100))) %>%
    mutate(ROLLSUM = rev(cumsum(PCAR*time))) 
  
  groupMaxima <- data %>% slice_max(order_by = ROLLSUM, n = 1) %>%
    pull(ROLLSUM)
  
  data$ROLLSUM <- c(groupMaxima, rep(NA, nrow(data) - length(groupMaxima)))
  data
}

【讨论】：

您好，感谢您的回答，我应该在我的问题中说，我的实际累积时间不在设定的时间间隔内，它来自放射性碳日期的贝叶斯模型，因此是相当随机的，到 6小数位，我可能会在我的示例数据中过度简化它。我尝试将马丁斯答案添加到我的函数中，但它返回错误：mutate() 输入Century 出现问题。 x 'to' 必须是有限数。我认为这可能是由于我的累积时间数据没有落在 100
好吧，尝试使用dput(head(yourData)) 为我们提供您真实数据的快照。
嗨，马丁，非常感谢您的帮助，它已经到了。我不知道我可以这样使用 dput，非常有用！请查看编辑。
所以你想把这些值相加，直到累积时间这 100，然后从零开始，直到达到下一个总持续时间 100？
嗨，马丁，是的，就像您第一次向我展示的那样，最大值为 0-100，然后是 100-200 等。它与示例数据完美配合。第一个错误是由于我的真实数据中的最终值为 NA 所以我使用 slice 来删除该行，因为它仅用于前几次计算。现在我只是试图将你的代码的最后一点合并到我的函数中。

【解决方案2】：

有很多方法，如果您的步数真的是 100 年的步数，并且值以恒定的间隔变为 0、20、40 - 您可以在本地执行此操作：

steps = 100
intervals = 20
ratio = steps / intervals
columns = df[,c("PCAR","time")] 
indices = rep(ratio,nrow(df)) %>% cumsum
    
PCAR_BIN = lapply(indices,function(x){
             localRange = (x-ratio):x
             sum(columns[localRange,1] * columns[localRange,2])
           })%>% unlist

我们现在可以绑定PICAR_BIN:

df = cbind(df,PICAR_BIN)

【讨论】：