【问题标题】:Merging two data frames with different numbers of observations and matching them合并具有不同观察数量的两个数据框并匹配它们
【发布时间】:2021-11-14 23:19:48
【问题描述】:

下面的数据框展示了我通过cbindX(Period1, Period2)合并的两个数据框。两者都有相同的列,但代表两个时间段,并且对 AEZ 有不同的观察结果。

以阿卜耶伊和安哥拉为例

> dput(new_data2[1:6, c(1,2,3,5,7,8,9,11) ])

structure(list(AEZ_1 = c("Tropics, lowland semi-arid", "Dominantly hydromorphic soils", "Tropics, lowland sub-humid", "Tropics, lowland semi-arid", "Dominantly built-up land", "Dominantly hydromorphic soils"), Country_1 = c("Abyei", "Abyei", "Angola", "Angola", "Angola", "Angola"), File_name_1 = c("PRIO_AEZ_FS_1981_2010", "PRIO_AEZ_FS_1981_2010", "PRIO_AEZ_FS_1981_2010", "PRIO_AEZ_FS_1981_2010", "PRIO_AEZ_FS_1981_2010", "PRIO_AEZ_FS_1981_2010"), Share_1 = c(9418.132755827, 520.625044495, 616817.473747498, 278142.684969026, 1330.4290338252, 74581.3053271609), AEZ_2 = c("Tropics, lowland semi-arid", "Tropics, lowland sub-humid", "Dominantly hydromorphic soils", "Tropics, lowland sub-humid", "Tropics, lowland semi-arid", "Dominantly built-up land"), Country_2 = c("Abyei", "Abyei", "Abyei", "Angola", "Angola", "Angola"), File_name_2 = c("PRIO_AEZ_FS_2011_2040", "PRIO_AEZ_FS_2011_2040", "PRIO_AEZ_FS_2011_2040", "PRIO_AEZ_FS_2011_2040", "PRIO_AEZ_FS_2011_2040", "PRIO_AEZ_FS_2011_2040"), Share_2 = c(8475.525647713, 942.6071081139, 520.625044495, 754641.194306016, 289900.409286599, 1330.4290338252)), row.names = c(NA, 6L), class = "data.frame")

我想要匹配 Country 以查看 AEZ 随时间的变化。

Image 2

谢谢

【问题讨论】:

  • 共享示例数据运行此dput('yourdf') 并在此处粘贴输出。
  • 如果您创建一个小的可重现示例以及预期的输出,这将更容易提供帮助。阅读how to give a reproducible example。图片不是共享数据/代码的正确方式。
  • 完成!谢谢。
  • 请提供足够的代码,以便其他人更好地理解或重现问题。

标签: r dataframe merge cbind


【解决方案1】:

假设您有两个具有国家属性的数据框(一个旧的和一个新的):

library(tidyverse)

old <- tribble(
  ~AEZ, ~Country,
  1, "Abyei",
  2, "Angola"
) %>%
  mutate(time = "old")
old
#> # A tibble: 2 x 3
#>     AEZ Country time 
#>   <dbl> <chr>   <chr>
#> 1     1 Abyei   old  
#> 2     2 Angola  old

new <- tribble(
  ~AEZ, ~Country,
  1, "Abyei",
  2, "Angola",
  3, "Angola"
) %>%
  mutate(time = "new")
new
#> # A tibble: 3 x 3
#>     AEZ Country time 
#>   <dbl> <chr>   <chr>
#> 1     1 Abyei   new  
#> 2     2 Angola  new  
#> 3     3 Angola  new

old %>%
  full_join(new) %>%
  pivot_wider(names_from = time, values_from = AEZ) %>%
  unnest(old) %>%
  unnest(new)
#> Joining, by = c("AEZ", "Country", "time")
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> # A tibble: 3 x 3
#>   Country   old   new
#>   <chr>   <dbl> <dbl>
#> 1 Abyei       1     1
#> 2 Angola      2     2
#> 3 Angola      2     3

reprex package (v2.0.1) 于 2021 年 9 月 21 日创建

【讨论】:

  • 嗨,丹洛,谢谢!由于我有 48 个国家/地区,您是否有建议快速编号每个国家/地区?
  • 您可能将表存储在一个文件中。用例如替换 Tribble读取_csv。我这样做只是为了提供示例数据。编号国家是什么意思?
【解决方案2】:

我的建议是:在合并之前将第一个文件(数据框)中的 AEZ 变量重命名为 AEZ_1981,将第二个文件中的相同变量重命名为 AEZ_2011。这样您就可以保留所有信息并比较合并文件中的更改。

最好, 列弗

【讨论】:

  • 嗨,列夫,谢谢。这是行不通的。它给出了相同的结果。
【解决方案3】:

如果有帮助,我会想办法:

new_data<-merge(Period1, Period2, by.x=c("Country", "AEZ"), by.y=c("Country", "AEZ"), all= TRUE)

【讨论】:

    猜你喜欢
    • 2015-07-25
    • 2016-09-09
    • 1970-01-01
    • 2020-08-09
    • 1970-01-01
    • 1970-01-01
    • 2021-03-04
    • 2015-03-21
    • 1970-01-01
    相关资源
    最近更新 更多