基于R中其他列值的重复行答案

【问题标题】：Duplicate rows based on other column values in R基于R中其他列值的重复行
【发布时间】：2019-10-28 10:15:30
【问题描述】：

我想按列复制行 Count。对于示例数据，我的代码工作正常，但是当我尝试使用大型数据集时，我收到错误：

rep(seq_len(dim(df1)[1]), df1$Count) 中的错误：'times' 参数无效

我的数据和代码：

df1 <- data.frame(Month = rep(month.abb[1:12],10,replace = TRUE), Product = paste0('Product ', rep(LETTERS[1:10], each = 12)),
                  Count = sample(c(1:10),120, replace = T),  stringsAsFactors = F)


df2 <- data.frame(df1[rep(seq_len(dim(df1)[1]), df1$Count), , drop = FALSE], row.names=NULL)

head(df2)
  Month   Product Count
1   Jan Product A     1
2   Feb Product A     4
3   Feb Product A     4
4   Feb Product A     4
5   Feb Product A     4
6   Mar Product A    10

我的数据由 45000 行和 5 列组成，包括 4 个字符和 1 个数字。对于这些数据，我得到了上述错误。

【问题讨论】：

如果您使用rep(1:(dim(df1)[1]), df1$Count) 而不是rep(seq_len(dim(df1)[1]), df1$Count) 会发生什么？确保 Count 没有负值 rep("A", -3) 产生 Error in rep("A", -3) : invalid 'times' argument。
同样的错误：invalid 'times' argument.
table(df1$Count) 生产？那里有负数吗？
是的，我有几行 NA。
您能按照@deepseefan 的建议粘贴table(df1$Count) 的输出吗？

标签： r dataframe duplicates row

【解决方案1】：

你可以这样做。这处理负值和NA 值。

df2 <- data.frame(df1[rep(seq_len(dim(df1)[1]),  with(df1, ifelse(Count > 0 & !is.na(Count), Count, 1))
), , drop = FALSE], row.names=NULL)

Count 为负数或NA 的行将保持原样（这意味着它们将被复制到df2 一次而不再重复）。

【讨论】：