使用 tidyr 汇总到多个列答案

【问题标题】：summarise to multiple columns using tidyr使用 tidyr 汇总到多个列
【发布时间】：2018-06-07 17:39:34
【问题描述】：

我有一个包含两列的数据框。

Col A 是参考向量，Col B 是参考中研究地点的对应向量。

我的问题是，在一个参考文献中可能有多个研究地点，而一个研究地点也可能在多个参考文献中找到。

我想对研究站点进行汇总，返回与研究站点链接的尽可能多的列。

类似：

Original table
-------------
ref  | site
-------------
A    | S1
-------------
A    | S2
-------------
B    | S1
-------------

New table
site  | ref1 | ref2
-------------------
S1    | A    | B
-------------------
S2    | A    | NA
-------------------

spread 不起作用，因为有重复的 site。

【问题讨论】：

spread 将工作，如果你这样做 df %>% spread(key=ref,value=ref)。另一种方法是使用table(df$site,df$ref)
这不起作用。 ref 是非唯一的，不能是 key。
它对我上面的数据和tidyr 0.8.0 有用
那是因为我的实际数据比这个简单的例子有更多的重复。

标签： r dplyr tidyr

【解决方案1】：

这是一种让spread 工作并生成您想要的列的方法。

library(tidyverse)
original <- tibble(
  ref = c("A", "A", "B", "A"),
  site = c("S1", "S2", "S1", "S1")
)

original %>%
  distinct() %>%
  group_by(site) %>%
  mutate(refcount = str_c("ref", row_number())) %>%
  spread(refcount, ref)
#> # A tibble: 2 x 3
#> # Groups:   site [2]
#>   site  ref1  ref2 
#>   <chr> <chr> <chr>
#> 1 S1    A     B    
#> 2 S2    A     <NA>

由reprex package (v0.2.0) 于 2018 年 6 月 7 日创建。

【讨论】：

这会传播所有引用。我只想为每个网站保留唯一的。
在这种情况下添加对 distinct() 的调用。答案已更新
完美。谢谢！