从因子创建列并计数[重复]答案

【问题标题】：Create columns from factors and count [duplicate]从因子创建列并计数[重复]
【发布时间】：2016-05-09 16:00:45
【问题描述】：

一个看似简单的问题却让我很忙。

我有一个数据框：

> df1
  Name Score
1  Ben     1
2  Ben     2
3 John     1
4 John     2
5 John     3

我想创建一个这样的表格摘要：

> df2
  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1

因此 df2 必须 (i) 仅显示唯一的“姓名”，(ii) 根据“分数”中的唯一因素创建列，以及 (iii) 计算一个人获得所述分数的次数。

我试过了：

df2 <- ddply(df1, c("Name"), summarise
          ,Score_1 = sum(df1$Score == 1)
          ,Score_2 = sum(df1$Score == 2)
          ,Score_3 = sum(df1$Score == 3))

产生：

  Name Score_1 Score_2 Score_3
1  Ben       2       2       1
2 John       2       2       1

所以我的尝试错误地计算了所有次出现，而不是计算“每组”

编辑： 根据 cmets，也尝试了reshape（可能只是做错了）：

> reshape(df1, idvar = "Name", timevar = "Score", direction = "wide")
  Name
1  Ben
3 John

首先，缺少“分数”列，但更糟糕的是，根据我对 reshape 的研究，我不相信我会得到每个因素的 count，这是重点。

【问题讨论】：

请搜索如何从长格式转换为宽格式。
使用reshape2（或data.table）：dcast(df1, Name ~ paste("Score", Score, sep="_"), fun.aggregate = length) 应该会给出您需要的结果。
谢谢@Jaap。这优雅地回答了这个问题。 FWIW，在重复标志does not 中指向的问题回答了我的问题，因此基于此它不是重复的，也许应该得到官方答案以供将来使用。
@gmarais 虽然这不是一个完全的骗局，但它会给你一些想法
另见table(df1)。

标签： r plyr

【解决方案1】：

您只需要对您的代码进行一些细微的修改。你应该使用.(Name) 而不是c("Name")：

ddply(df1, .(Name), summarise,
      Score_1 = sum(Score == 1),
      Score_2 = sum(Score == 2),
      Score_3 = sum(Score == 3))

给予：

  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1

其他可能性包括：

1. table(df1) 正如comments 中提到的@alexis_laz，这给出了：

> table(df1)
       Score
Name   1 2 3
  Ben  1 1 0
  John 1 1 1

2. reshape2 包的dcast 函数（或具有相同dcast 函数的data.table）：

library(reshape2) # or library(data.table)
dcast(df1, Name ~ paste0("Score_", Score), fun.aggregate = length)

给予：

  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1

【讨论】：

似乎data.table 中更有效的方法是使用df1[ , .N, by = .(Name, Score = paste0("Score_", Score))]（基于简短调查here）
@MichaelChirico thanx，看到了 ;-) 很好的比较

【解决方案2】：

我们可以使用dplyr/tidyr

 library(dplyr)
 library(tidyr)
 df1 %>% 
     group_by(Name) %>%
      mutate(n=1, Score= paste('Score', Score, sep='_')) %>% 
      spread(Score, n, fill=0) 
 #     Name Score_1 Score_2 Score_3
 #  (chr)   (dbl)   (dbl)   (dbl)
 #1   Ben       1       1       0
 #2  John       1       1       1

【讨论】：