【发布时间】:2019-09-24 03:11:01
【问题描述】:
我有两个具有相同行组合 Country 和 Year 的数据集,我想以行组合匹配的方式将一个数据集中的一些列添加到另一个数据集中。
数据集 1:
+----------+------+---------+---------+-----+
| Country | Year | exports | imports | ... |
+----------+------+---------+---------+-----+
| Germany | 2000 | 0.70 | 0.40 | ... |
| Germany | 2001 | 0.68 | 0.41 | ... |
| Germany | 2002 | 0.71 | 0.48 | ... |
| Germany | 2003 | ... | ... | ... |
| Spain | 2000 | 0.51 | 0.56 | ... |
| Spain | 2001 | 0.48 | 0.50 | ... |
| Spain | 2002 | 0.50 | 0.53 | ... |
| Spain | 2003 | ... | ... | ... |
| ... | ... | ... | ... | ... |
+----------+------+---------+---------+-----+
数据集 2:
+----------+-----+------+--------------+-------+-----+
| Country | CC | Year | unemployment | Pop | ... |
+----------+-----+------+--------------+-------+-----+
| Germany | GER | 2000 | 0.03 | 79.50 | ... |
| Germany | GER | 2001 | 0.05 | 79.53 | ... |
| Germany | GER | 2002 | 0.04 | 79.80 | ... |
| Germany | GER | 2003 | ... | ... | ... |
| Hungary | HUN | 2000 | ... | ... | ... |
| Hungary | HUN | 2001 | ... | ... | ... |
| Hungary | HUN | 2002 | ... | ... | ... |
| Hungary | HUN | 2003 | ... | ... | ... |
| Spain | ESP | 2000 | 0.08 | 40.2 | ... |
| Spain | ESP | 2001 | 0.11 | 40.5 | ... |
| Spain | ESP | 2002 | 0.10 | 40.55 | ... |
| Spain | ESP | 2003 | ... | ... | ... |
| ... | ... | ... | ... | ... | ... |
+----------+-----+------+--------------+-------+-----+
我希望合并后的数据如下所示:
+----------+-----+------+---------+---------+--------------+-------+-----+
| Country | CC | Year | exports | imports | unemployment | Pop | ... |
+----------+-----+------+---------+---------+--------------+-------+-----+
| Germany | GER | 2000 | 0.70 | 0.40 | 0.03 | 79.50 | ... |
| Germany | GER | 2001 | 0.68 | 0.41 | 0.05 | 79.53 | ... |
| Germany | GER | 2002 | 0.71 | 0.48 | 0.04 | 79.80 | ... |
| Germany | GER | 2003 | ... | ... | ... | ... | ... |
| Spain | ESP | 2000 | 0.51 | 0.56 | 0.08 | 40.2 | ... |
| Spain | ESP | 2001 | 0.48 | 0.50 | 0.11 | 40.5 | ... |
| Spain | ESP | 2002 | 0.50 | 0.53 | 0.10 | 40.55 | ... |
| Spain | ESP | 2003 | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... | ... | ... |
+----------+-----+------+---------+---------+--------------+-------+-----+
因此,不在数据集 1 中的国家(如本例中的匈牙利)不在合并数据集中,国家代码也在新数据集中。有人可以告诉我如何实现这一目标吗?我有 28 年的时间在大约 100 个国家/地区工作。所以使用一个我必须指定每个组合的函数并不方便......
我尝试将其与 merge() 合并,但没有成功,因为它只是创建了数百行具有相同国家和年份组合的行。
【问题讨论】:
-
"我试图将它与
merge()合并,但没有成功..." 你到底尝试了什么?merge肯定是此任务的明智选择,前提是它使用正确。分享你的尝试并描述为什么结果不是你想要的。
标签: r