【发布时间】:2022-01-26 05:23:17
【问题描述】:
我正在尝试对变量组、类型和年份进行分组。每个组、类型和年份都有一个特定的代码,每年都在变化。我想创建一个名为“差异”的列,如果组和类型在一年中的代码为 200,下一年为 210,则“差异”列将其注册为增加 10。
group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
df <- data.frame(group, type, year, code)
这是 df 的样子:
group type year code
1 A small 1995 100
2 A medium 1995 100
3 A large 1995 100
4 B small 1995 200
5 B medium 1995 200
6 B large 1995 200
7 C small 1995 300
8 C medium 1995 300
9 C large 1995 300
10 A small 1996 150
11 A medium 1996 150
12 A large 1996 100
13 B small 1996 200
14 B medium 1996 200
15 B large 1996 200
16 C small 1996 350
17 C medium 1996 320
18 C large 1996 300
19 A small 1997 130
20 A medium 1997 170
21 A large 1997 90
22 B small 1997 210
23 B medium 1997 90
24 B large 1997 80
25 C small 1997 310
26 C medium 1997 300
27 C large 1997 320
我想要以下输出:
group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
difference <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
50, 50, 0, 0, 0, 0, 50, 20, 0,
-20, 20, -10, 10, 110, 120, -40, -20, 0)
df2 <- data.frame(group, type, year, code, difference)
group type year code difference
1 A small 1995 100 NA
2 A medium 1995 100 NA
3 A large 1995 100 NA
4 B small 1995 200 NA
5 B medium 1995 200 NA
6 B large 1995 200 NA
7 C small 1995 300 NA
8 C medium 1995 300 NA
9 C large 1995 300 NA
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 110
24 B large 1997 80 120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 0
这是我尝试过的:
df3 <- df2 %>%
group_by(group, type, year) %>%
mutate(difference = code - lag(code))
问题是滞后似乎没有考虑分组,而只是从它之前的行中减去。有什么建议吗?
【问题讨论】:
-
如果你想要代码的年份值之间的差异,那么不要按年份分组。你现在的做法是每个分组只有一行。