【发布时间】:2018-01-02 12:16:22
【问题描述】:
我有一个看起来像这样的数据框:
> df
# A tibble: 5,427 x 3
cond desired inc
<chr> <dbl> <dbl>
1 <NA> 0 0
2 <NA> 5 5
3 X 10 5
4 X 7 7
5 <NA> 16 16
6 <NA> 21 5
7 <NA> 26 5
8 <NA> 31 5
9 X 37 6
10 <NA> 5 5
这已经包含了我想要的输出。我想要做的是将inc 的值相加,但如果在前一行的cond 列中有X,则重置总和。因此,例如在9 行中,我会从前一行(31)中获取desired-值,并从9(6)行中添加inc-值,得到37。在@987654329 行中@我只取inc-value,因为前一行的cond-column 是X。我使用循环解决了这个问题,但我想使用矢量化解决方案。到目前为止,我得到了这个:
df$test <- 0
df <- df %>% mutate(test = ifelse(is.na(lag(df$cond)), lag(test) + inc, inc))
如果我得到这个后运行第二行:
> df
# A tibble: 5,427 x 4
cond desired inc test
<chr> <dbl> <dbl> <dbl>
1 <NA> 0 0 NA
2 <NA> 5 5 5
3 X 10 5 5
4 X 7 7 7
5 <NA> 16 16 16
6 <NA> 21 5 5
7 <NA> 26 5 5
8 <NA> 31 5 5
9 X 37 6 6
10 <NA> 5 5 5
第二次运行后是这样的:
> df
# A tibble: 5,427 x 4
cond desired inc test
<chr> <dbl> <dbl> <dbl>
1 <NA> 0 0 NA
2 <NA> 5 5 NA
3 X 10 5 10
4 X 7 7 7
5 <NA> 16 16 16
6 <NA> 21 5 21
7 <NA> 26 5 10
8 <NA> 31 5 10
9 X 37 6 11
10 <NA> 5 5 5
# ... with 5,417 more rows
第三次:
> df
# A tibble: 5,427 x 4
cond desired inc test
<chr> <dbl> <dbl> <dbl>
1 <NA> 0 0 NA
2 <NA> 5 5 NA
3 X 10 5 NA
4 X 7 7 7
5 <NA> 16 16 16
6 <NA> 21 5 21
7 <NA> 26 5 26
8 <NA> 31 5 15
9 X 37 6 16
10 <NA> 5 5 5
那么,第五次之后:
> df
# A tibble: 5,427 x 4
cond desired inc test
<chr> <dbl> <dbl> <dbl>
1 <NA> 0 0 NA
2 <NA> 5 5 NA
3 X 10 5 NA
4 X 7 7 7
5 <NA> 16 16 16
6 <NA> 21 5 21
7 <NA> 26 5 26
8 <NA> 31 5 31
9 X 37 6 37
10 <NA> 5 5 5
我在 mutate-command 本身中使用了我通过 mutate 创建的列,我猜这是导致这种行为/问题的原因。有什么方法可以达到我想要的结果吗?提前致谢!
数据框:
structure(list(cond = c(NA, NA, "X", "X", NA, NA, NA, NA, "X",
NA, NA, NA, NA, NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, "X",
NA, NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, "X", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, "X", NA, NA, NA, NA, NA, NA, NA, "X", NA, NA, "X",
NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, NA, NA, NA, "X", NA,
NA, NA, NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, NA, NA, NA,
NA, "X", NA, NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, NA, NA,
NA, NA, NA, "X", NA, NA, NA, "X", NA, NA, NA, NA, "X", NA, NA,
NA, NA, NA, NA, NA, NA, "X", NA, NA, "X", NA, NA, NA, NA, "X",
NA, NA, NA, NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, NA, NA,
NA, "X", NA, "X", NA, NA, NA, NA, NA, NA, NA, NA, "X", NA, NA,
NA, NA, NA, NA, NA, "X", NA, NA, NA, "X", "X", NA, NA, NA, NA,
NA, NA, NA, NA, "X", "X", NA, "X", NA, NA, NA, NA, NA, NA, NA,
NA, "X", NA, NA, NA, "X", NA, NA, NA, NA, NA, NA, NA, NA, "X",
NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, "X", NA, NA, NA, NA,
"X", NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, NA, NA, NA, NA,
"X", NA, NA, NA, NA, NA, NA, "X", NA, NA, NA, NA, "X", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "X", NA, "X",
NA, "X", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, "X", NA, NA, NA), desired = c(0, 5, 10, 7, 16, 21, 26,
31, 37, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 5, 10, 15, 20,
30, 7, 15, 21, 25, 40, 45, 55, 12, 20, 25, 30, 35, 40, 45, 50,
55, 60, 65, 70, 75, 5, 10, 15, 20, 22, 30, 35, 45, 50, 55, 60,
65, 70, 75, 9, 14, 19, 24, 29, 34, 39, 44, 5, 7, 10, 2, 7, 12,
17, 22, 27, 5, 10, 15, 20, 25, 30, 35, 38, 4, 7, 12, 17, 22,
27, 32, 37, 39, 13, 18, 23, 28, 33, 38, 43, 48, 53, 5, 10, 15,
20, 25, 30, 35, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 5, 10,
15, 20, 2, 10, 15, 20, 25, 5, 10, 15, 20, 25, 30, 35, 40, 45,
5, 8, 12, 5, 10, 14, 19, 24, 5, 10, 15, 20, 25, 30, 35, 40, 45,
5, 10, 15, 20, 25, 28, 33, 38, 5, 11, 5, 10, 15, 20, 25, 30,
35, 40, 45, 12, 17, 22, 27, 32, 37, 42, 47, 5, 10, 15, 20, 5,
5, 10, 15, 20, 25, 30, 35, 40, 45, 5, 5, 10, 5, 10, 15, 20, 25,
30, 35, 40, 45, 5, 10, 15, 20, 5, 10, 15, 20, 25, 30, 34, 39,
44, 5, 10, 15, 20, 25, 30, 5, 10, 15, 20, 25, 5, 10, 15, 20,
25, 5, 10, 15, 20, 25, 29, 5, 10, 15, 20, 23, 25, 30, 35, 40,
5, 15, 20, 25, 30, 35, 40, 5, 10, 15, 20, 25, 5, 10, 15, 20,
25, 28, 33, 38, 43, 48, 53, 58, 71, 76, 81, 5, 10, 5, 10, 5,
10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 5,
10, 15), inc = c(0, 5, 5, 7, 16, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 10, 7, 8, 6, 4, 15, 5, 10, 12, 8, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 8, 5, 10, 5, 5,
5, 5, 5, 5, 9, 5, 5, 5, 5, 5, 5, 5, 5, 2, 3, 2, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 3, 4, 3, 5, 5, 5, 5, 5, 5, 2, 13, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 2, 8, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
3, 4, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
3, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 12, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4,
5, 5, 5, 5, 3, 2, 5, 5, 5, 5, 10, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 3, 5, 5, 5, 5, 5, 5, 13, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5)), .Names = c("cond",
"desired", "inc"), row.names = c(NA, -300L), class = c("tbl_df",
"tbl", "data.frame"))
【问题讨论】:
-
cond列是x,即使在行9中也是如此。因此,根据您规定的规则,总和也应设置在那里。为什么行9与行3或4不同? -
X 影响下一行,因此第 9 行中的 X 重置总和,第 10 行 inc 成为总和。第 3 行和第 4 行也是如此:在第 4 行和第 5 行中,所需的值与该行的 inc 相同。