R从长到宽的数据框，具有实值列答案

【问题标题】：R from long to wide dataframe with real valued columnsR从长到宽的数据框，具有实值列
【发布时间】：2020-08-24 11:35:54
【问题描述】：

我有一个简短的问题，涉及重塑我的数据框，其中我有 ID“grouped_by”数据。我有以下 df 架构（+ 2 个我希望扩大的示例性实例（我总共有 >5000 个））：

   id                  solver   scoreA  scoreB  group   size 
   <chr>               <chr>    <dbl>   <dbl>   <chr>   <dbl>
 1 instance_1          s1       1        0.5    g1      1000                     
 2 instance_1          s2       100      50     g1      1000

...我想要得到的是：

   id           solver.best  scoreA.s1  scoreA.s2  scoreB.s1   scoreB.s2  group   size 
   <chr>        <chr>        <dbl>      <dbl>      <dbl>       <dbl>      <chr>   <dbl>
 1 instance_1   s1           1          100        0.5         50         g1      1000

感谢您的帮助。 BR

【问题讨论】：

如果你有超过 5000 行，并且你想这样做 - 你将有大约 5000 列，这几乎肯定是错误的方法。你想要完成的事情应该有一条替代路线。
ID 总是对 2 个实例进行分组......基本上是说我希望两行变平。即从 5000 行到 2500 行
你可以算一算，5 x 5000 和 2500 x 5000 有什么区别？
我不明白你的意思。我承认通过 tidyverse 技术进行传播可能不会导致首选结果，并且如果没有分组，最终结果将是 2500 x 5000 df。除此之外，我认为将 2 行缩小为 1 行的想法一旦按 ID 分组没有任何致命性......
请您分享dput的目的是我们可以在我们的R环境中复制您的数据（或其中的一部分）并使用它。当您使用不完整的dput（我们无法复制）更新您的帖子时，这与根本不分享一样好。如果您的数据非常大，您只能共享前几行，例如 dput(head(df)) 前 6 行。

标签： r dataframe tidyverse reshape tibble

【解决方案1】：

也许你可以试试下面的代码

reshape(within(df, Q <- ave(seq(nrow(df)), id, FUN = seq_along)),
  direction = "wide",
  idvar = "id", 
  timevar = "Q"
)

给了

> reshape(cbind(df,Q = seq(nrow(df))),direction = "wide",idvar = "id",timevar = "Q")
          id solver.1 scoreA.1 scoreB.1 group.1 size.1 solver.2 scoreA.2
1 instance 1       s1        1      0.5      g1   1000       s2      100
  scoreB.2 group.2 size.2
1       50      g1   1000

数据

> dput(df)
structure(list(id = c("instance 1", "instance 1"), solver = c("s1", 
"s2"), scoreA = c(1L, 100L), scoreB = c(0.5, 50), group = c("g1",
"g1"), size = c(1000L, 1000L)), class = "data.frame", row.names = c("1",
"2"))

【讨论】：

谢谢，但它只为我返回 NA 值：/.... 我只有 2 行。但是总是有 2 个实例 groubed_by ID...
@gero 你在我的回答中使用了这些数据吗？否则，请dput()你的数据然后我会看看发生了什么
structure(list(id = c("instance 1", "instance 1", "instance 2", "instance 2", [... more instances/groups ...], 求解器= c("S1", "S2", [... more solver ...], scoreA = c(3.3818, 358.1937, ...., scoreB = c(1.3818, 100.1937, ...., group = c("g1", "g1", ...., size = c("1000", "1000", ....)
@gero 你能把你的数据放在你的帖子里吗？您在评论中显示的数据似乎不完整
数据保密。但是，这里是 dput() 中的所有内容（在编辑后的帖子中找到）

【解决方案2】：

因为我仍然希望有一个方便的，例如tidyverse，最佳实践，我仍然想分享实用的方法，它在概念上同样有效:)：

# create empty (wide) target df
wide_df <- data.frame(matrix(ncol = 8, nrow = 0))

names <- c("id", "best_solver", "scoreA_s1", "scoreA_s2",
           "scoreB_s1", "scoreB_s2", "group", "size")
colnames(wide_df) <- names


# traverse grouped by and arranged original (long) df 
for(i in seq(2, length(long_df$group), by = 2)){
  wide_df[i/2, "id"] <- long_df[i, "id"]
  wide_df[i/2, "best_solver"] <- long_df[which(long_df[(i-1):i, "scoreA"] ==
                                 min(long_df[i-1, "scoreA"], long_df[i, "scoreA"])), 
                                 "solver"]
  wide_df[i/2, "scoreA_s1"] <- long_df[i-1, "scoreA"]
  wide_df[i/2, "scoreA_s2"] <- long_df[i, "scoreA"]
  wide_df[i/2, "scoreB_s1"] <- long_df[i-1, "scoreB"]
  wide_df[i/2, "scoreB_s2"] <- long_df[i, "scoreB"]
  wide_df[i/2, "group"] <- long_df[i, "group"]
  wide_df[i/2, "size"] <- long_df[i, "size"]
}

【讨论】：