R：使用 dplyr 或 reshape2 包制作数据透视表答案

【问题标题】：R: Making pivot table with dplyr or reshape2 packageR：使用 dplyr 或 reshape2 包制作数据透视表
【发布时间】：2016-01-04 13:24:10
【问题描述】：

我正在尝试使用 dplyr 或 reshape2 包在 R 中制作简单的数据透视表，因为我的数据集太大并且 R 使用 sqldf 时内存不足。我想要制作数据透视表的数据集的两列是"Product" 和"Cust_Id"。我想计算每个产品的客户数量。这就是我得到的。

library(reshape2)
mydata<-read.table("Book1.txt",header=TRUE,fill=TRUE)
mydata.m<-melt(mydata,id=c("Product"),measured=c(Cust_Id))
mydata.d<-dcast(mydata.m,Product~variable,count)

Error in UseMethod("group_by_"):
no applicable method for 'group_by_' applied to an object of class "c('integer','numeric')"

我也用下面的代码尝试了dplyr（不确定最后一步，虽然我在另一台笔记本电脑上做了）

library(dplyr)
mydata.df<-tbl_df(mydata)
summarize(mydata.df,Product,Cust_Id=n())

我没有收到错误消息，但输出中似乎缺少很多值。我非常感谢您的意见。提前致谢。

【问题讨论】：

您能否 dput() 部分数据并分享您正在寻找的结果的示例？

标签： r dplyr reshape reshape2

【解决方案1】：

试试这个：

library(dplyr)
mydata <- mydata %>%
  group_by(Product) %>%
  summarise(nCustomers = n())

或者，如果您只想计算唯一身份客户，您可以这样做：

library(dplyr)
mydata <- mydata %>%
  group_by(Product) %>%
  summarise(nCustomers = n_distinct(Cust_Id))

【讨论】：

【解决方案2】：

如果这确实是一个大数据集，那么data.table 包中的最佳选择

require(data.table)

mydata_data_table = data.table(mydata)

number_customer = mydata_data_table[, .(number_customers = .N), by=Product]

【讨论】：