【发布时间】:2014-06-20 12:13:25
【问题描述】:
我的数据采用 data.frame 格式,例如以下示例数据:
data <-
structure(list(Article = structure(c(1L, 1L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("10004", "10006", "10007"), class = "factor"),
Demand = c(26L, 780L, 2L, 181L, 228L, 214L, 219L, 291L, 104L,
72L, 155L, 237L, 182L, 148L, 52L, 227L, 2L, 355L, 2L, 432L,
1L, 156L), Week = c("2013-W01", "2013-W01", "2013-W01", "2013-W01",
"2013-W01", "2013-W02", "2013-W02", "2013-W02", "2013-W02",
"2013-W02", "2013-W03", "2013-W03", "2013-W03", "2013-W03",
"2013-W03", "2013-W04", "2013-W04", "2013-W04", "2013-W04",
"2013-W04", "2013-W04", "2013-W04")), .Names = c("Article",
"Demand", "Week"), class = "data.frame", row.names = c(NA, -22L))
我想按周和文章来总结需求栏。为此,我使用:
library(dplyr)
WeekSums <-
data %>%
group_by(Article, Week) %>%
summarize(
WeekDemand = sum(Demand)
)
但由于某些文章在某些周内没有售出,因此每篇文章的行数不同(WeekSums 数据框中仅显示有销售的周数)。如何调整我的数据,使每篇文章的行数相同(每周一行),包括需求为 0 的周数?
输出应如下所示:
Article Week WeekDemand
1 10004 2013-W01 1215
2 10004 2013-W02 900
3 10004 2013-W03 774
4 10004 2013-W04 1170
5 10006 2013-W01 0
6 10006 2013-W02 0
7 10006 2013-W03 0
8 10006 2013-W04 5
9 10007 2013-W01 2
10 10007 2013-W02 0
11 10007 2013-W03 0
12 10007 2013-W04 0
我试过了
WeekSums %>%
group_by(Article) %>%
if(n()< 4) rep(rbind(c(Article,NA,NA)), 4 - n() )
但这不起作用。在我最初的方法中,我通过将第 1-4 周的数据框与每篇文章的原始数据文件合并来解决这个问题。这样,我每篇文章有 4 周(行),但是使用 for 循环的实现效率非常低,所以我尝试对 dplyr(或任何其他更有效的包/函数)做同样的事情。任何建议将不胜感激!
【问题讨论】: