重复data.frame，添加主键答案

【问题标题】：Repeat data.frame, add a primary key重复data.frame，添加主键
【发布时间】：2017-03-20 00:27:12
【问题描述】：

我有一个数据框。说，

data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))

现在我想复制它，所以我在同一个 data.frame 中有一个副本。我最终会得到这样的结果，

 data.frame(x = c(1, 3, 1, 3), y = c(5, 0, 5, 0), id = c("A", "B", "A", "B"))

现在，这与我想要的非常接近，但我还想附加 id 列，以根据我想要的重复数使它们对每一行都是唯一的（在这种情况下只有一个，但我想要 n 个） .

data.frame(x = c(1, 3, 1, 3), y = c(5, 0, 5, 0), id = c("A-1", "B-1", "A-2", "B-2"))

所以，如您所见，我可以全神贯注地制作对象，但我想从使用基本 R 编写“hacky”代码转变为使用 dplyr 复制此功能。

【问题讨论】：

在基础 R 中不是特别 hacky：out <- dat[rep(1:nrow(dat), 2),]; out$id <- paste(out$id, rep(1:2, each=nrow(dat)), sep="-")
同样相关 - stackoverflow.com/questions/38237350/… - dat %>% slice(rep(1:n(), 2)) 将帮助您完成大部分工作，但可以说更复杂。
我喜欢你的解决方案。你认为尝试使用 dplyr 来解决这个问题是错误的吗？看起来您的解决方案很容易阅读且速度很快。
我很难看你是否能做到 a) 比我发布的 2 行更简单 b) 大数据更快。在这种情况下，基数 R 似乎完全可以接受。

标签： r dplyr

【解决方案1】：

所以我注意到您想使用 dplyr 包执行此操作。我认为结合使用来自dplyr 的group_by()、mutate() 和row_number() 函数，您可以很好地完成这项工作。

library(dplyr)

# so you start with this data.frame:
df <- data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))

# to attach an exact duplication of this df to itself:
df <- rbind(df, df)


# group by id, add a second id to increment within each id group ("A", "B", etc.)
df2 <- group_by(df, id) %>%
    mutate(id2 = row_number())


# paste the id and id2 together for the desired result
df2$id_combined <- paste0(df2$id, '-', df2$id2)

# inspect results
df2
    # x     y     id   id2 id_combined
    # <dbl> <dbl> <fctr> <int>       <chr>
    # 1     1     5      A     1         A-1
    # 2     3     0      B     1         B-1
    # 3     1     5      A     2         A-2
    # 4     3     0      B     2         B-2

请记住，您现在拥有的是“tibble”/“grouped data.frame”，而不是基本的 data.frame。

如果您愿意，可以简单地将其恢复为原始 data.frame。

df2 <- data.frame(df2, stringsAsFactors = F)

# now to remove the additional columns that were added in this process:
df2$id2 <- NULL

编辑——探索将同一数据框的`n`复制附加到自身的其他选项：

# Not dplyr, but this is how I would normally handle this type of task:
df <- data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))

# set n equal to the number of times you want to replicate the data.frame
n <- 13

# initialize space to hold the data frames
list_dfs <- list()

# loop through, adding individual data frames to the list
for(i in 1:n) {
    list_dfs[[i]] <- df
}

# combine them all with do.call
my_big_df <- do.call(rbind, list_dfs)

然后，您可以使用上面显示的 group_by()、mutate() 和 row_number() 函数为 data.frame 创建新的唯一键。

【讨论】：

那么我将如何使用 dplyr 来复制原始 data.frame？
如果复制是指将现有 data.frame 的精确副本附加到自身，那么我只会做df <- rbind(df, df)。也许我不明白，创建具有“A-1”类型格式的附加 id 字段的最终目标是什么？
我想用 dplyr 附加 n 个精确的副本
我不确定如何使用 dplyr 做到这一点，但我确实编辑了我的答案以显示如何通过列表和调用 do.call(rbind, <list_of_dataframes_here>) 来完成此操作
在这一点上，我想知道尝试使用 dplyr 完成此操作是否是我的错误？ @thelatemail 在我的问题中的评论似乎很简洁地回答了它。

编辑——探索将同一数据框的n复制附加到自身的其他选项：

编辑——探索将同一数据框的`n`复制附加到自身的其他选项：