【发布时间】:2018-02-15 04:29:00
【问题描述】:
我正在尝试基于 2 列合并 4 个数据框,但要跟踪列源自哪个数据框。我在跟踪列时遇到了问题。
(见 dput(dfs) 文章结尾)
#df example (df1)
Name Color Freq
banana yellow 3
apple red 1
apple green 4
plum purple 8
#create list of dataframes
list.df <- list(df1, df2, df3, df4)
#merge dfs on column "Name" and "Color"
combo.df <- Reduce(function(x,y) merge(x,y, by = c("Name", "Color"), all = TRUE, accumulate=FALSE, suffixes = c(".df1", ".df2", ".df3", ".df4")), list.df)
这会给出以下警告:
警告信息: 在 merge.data.frame(x, y, by = c("Name", "Color"), all = TRUE, : 列名“Freq.df1”、“Freq.df2”在结果中重复
并输出此数据框:
#combo df example
Name Color Freq.df1 Freq.df2 Freq.df1 Freq.df2
banana yellow 3 3 7 NA
apple red 1 2 9 1
apple green 4 NA 8 2
plum purple 8 1 NA 6
df1 和 df2 仅在名称中重复。填充combo 的第三和第四列的值实际上分别来自df3 和df4。
我真正想要的是:
Name Color Freq.df1 Freq.df2 Freq.df3 Freq.df4
banana yellow 3 3 7 NA
apple red 1 2 9 1
apple green 4 NA 8 2
plum purple 8 1 NA 6
我怎样才能做到这一点?我知道 merge(..., suffixes) 函数只能处理 2 的字符向量,但我不知道应该如何解决。谢谢!
df1 <-
structure(list(Name = structure(c(2L, 1L, 1L, 3L), .Label = c("apple",
"banana", "plum"), class = "factor"), Color = structure(c(4L,
3L, 1L, 2L), .Label = c("green", "purple", "red", "yellow"), class = "factor"),
Freq = c(3, 1, 4, 8)), .Names = c("Name", "Color", "Freq"
), row.names = c(NA, -4L), class = "data.frame")
df2 <-
structure(list(Name = structure(c(2L, 1L, 3L), .Label = c("apple",
"banana", "plum"), class = "factor"), Color = structure(c(3L,
2L, 1L), .Label = c("purple", "red", "yellow"), class = "factor"),
Freq = c(3, 2, 1)), .Names = c("Name", "Color", "Freq"), row.names = c(NA,
-3L), class = "data.frame")
df3 <-
structure(list(Name = structure(c(2L, 1L, 1L), .Label = c("apple",
"banana"), class = "factor"), Color = structure(c(3L, 2L, 1L), .Label = c("green",
"red", "yellow"), class = "factor"), Freq = c(7, 9, 8)), .Names = c("Name",
"Color", "Freq"), row.names = c(NA, -3L), class = "data.frame")
df4 <-
structure(list(Name = structure(c(1L, 1L, 2L), .Label = c("apple",
"plum"), class = "factor"), Color = structure(c(3L, 1L, 2L), .Label = c("green",
"purple", "red"), class = "factor"), Freq = c(1, 2, 6)), .Names = c("Name",
"Color", "Freq"), row.names = c(NA, -3L), class = "data.frame")
【问题讨论】:
-
你能用
dput分享所有4个data.frames吗? -
@TUSHAr - 在帖子中编辑
-
这很棘手。不确定在合并进行时是否可以优雅地跟踪它。我们所能做的就是将
data.frame的名称作为外部值以与我们期望合并发生的顺序相同的顺序传递。