【发布时间】:2020-11-13 04:37:21
【问题描述】:
我有两个如下所示的数据集:
country <- c("Albania","Albania","Albania","Albania","Albania",
"Belgium","Belgium","Belgium","Belgium","Belgium",
"Canada","Canada","Canada","Canada","Canada",
"Denmark","Denmark","Denmark","Denmark","Denmark")
year <- c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996)
country.year <- data.frame(country, year)
country.year
country year
1 Albania 1992
2 Albania 1993
3 Albania 1994
4 Albania 1995
5 Albania 1996
6 Belgium 1992
7 Belgium 1993
8 Belgium 1994
9 Belgium 1995
10 Belgium 1996
11 Canada 1992
12 Canada 1993
13 Canada 1994
14 Canada 1995
15 Canada 1996
16 Denmark 1992
17 Denmark 1993
18 Denmark 1994
19 Denmark 1995
20 Denmark 1996
country <- c("Albania","Albania",
"Belgium","Belgium",
"Canada","Canada",
"Denmark","Denmark","Denmark")
cabinet <- c(1200, 1201,
1560, 1566,
220, 440,
880, 819, 870)
cabinet.position2 <- c(12,10,
0, 5,
-9, 2,
1,-15)
begining.date <- c("1991-12-01", "1996-01-10",
"1991-05-07", "1995-04-23",
"1992-01-01", "1996-01-01",
"1991-08-03", "1992-07-01", "1996-06-01")
end.date <- c("1996-01-09", "2000-02-01",
"1995-04-01", "1999-04-23",
"1995-09-01", "1999-11-30",
"1992-02-03", "1996-05-20", "2000-04-01")
cabinets <- data.frame(country, cabinet, begining.date, end.date)
> cabinets
country cabinet begining.date end.date
1 Albania 1200 1991-12-01 1996-01-09
2 Albania 1201 1996-01-10 2000-02-01
3 Belgium 1560 1991-05-07 1995-04-01
4 Belgium 1566 1995-04-23 1999-04-23
5 Canada 220 1992-01-01 1995-09-01
6 Canada 440 1996-01-01 1999-11-30
7 Denmark 880 1991-08-03 1992-02-03
8 Denmark 819 1992-07-01 1996-05-20
9 Denmark 870 1996-06-01 2000-04-01
我想要的是一个数据集,其中分析单位是国家*年,如数据框“country.year”中一样,但包括数据框“橱柜”中每个橱柜的位置变量。这个职位变量涉及内阁的政策立场,因此它与数据转换任务确实无关,但对以后很重要。所以是这样的:
country <- c("Albania","Albania","Albania","Albania","Albania",
"Belgium","Belgium","Belgium","Belgium","Belgium",
"Canada","Canada","Canada","Canada","Canada",
"Denmark","Denmark","Denmark","Denmark","Denmark")
year2 <- c(1992, 1993, 1994, 1995, 1996,
1992, 1993, 1994, 1995, 1996,
1992, 1993, 1994, 1995, 1996,
1992, 1993, 1994, 1995, 1996)
cabinet2 <- c(1200,1200,1200,1200, 1201,
1560,1560,1560, 1566, 1566,
220,220,220,220, 440,
819, 819, 819, 819, 870)
cabinet.position2 <- c(12,12,12,12, 10,
0,0,0, 5, 5,
-9,-9,-9,-9, 2,
1, 1, 1, 1, -15)
desired.df <- data.frame(country, year2, cabinet2,cabinet.position2)
desired.df
country year2 cabinet2 cabinet.position2
1 Albania 1992 1200 12
2 Albania 1993 1200 12
3 Albania 1994 1200 12
4 Albania 1995 1200 12
5 Albania 1996 1201 10
6 Belgium 1992 1560 0
7 Belgium 1993 1560 0
8 Belgium 1994 1560 0
9 Belgium 1995 1566 5
10 Belgium 1996 1566 5
11 Canada 1992 220 -9
12 Canada 1993 220 -9
13 Canada 1994 220 -9
14 Canada 1995 220 -9
15 Canada 1996 440 2
16 Denmark 1992 819 1
17 Denmark 1993 819 1
18 Denmark 1994 819 1
19 Denmark 1995 819 1
20 Denmark 1996 870 -15
我在这里的主要问题是将橱柜分配给不同的年份。正如您在上面看到的,每年都需要分配一个内阁及其职位。更重要的是,对我来说真正困难的是,有时一年有多个机柜,所以我需要每一年的机柜都是在那一年中花费更多时间的机柜(例如,如果 1995 年的机柜 A 从1-5月,B柜在6-12月,1995年应该分配到B柜)。
有什么想法吗?
非常感谢!
【问题讨论】:
-
@DavidArenburg OP 定义了一个向量
cabinet.position2,他们显然打算将其添加到他们的cabinets <- data.frame(...)调用中。cabinets有 9 行,但向量cabinet.position2只有 8 个元素。连接后的预期输出desired.df没有一行与cabinet$cabinet == 880匹配,因此我们不知道实际值。为了使问题可以回答,我为此行添加了NA。如果你看一下修订历史,应该很清楚。 -
对不起,对不起。我的错。我编辑并明确了变量代表什么。
标签: r dplyr merge data.table lubridate