【问题标题】:R - repetition with dplyrR - 用 dplyr 重复
【发布时间】:2015-07-14 10:24:27
【问题描述】:

要将我的“长紧凑”格式数据转换为宽格式,我需要使用rep 函数。

我不知道如何将其集成到 dplyr 流中。

这是我需要使用的重复:

dta1 = as.data.frame(cbind(rep(dta$id, dta$duration), rep(dta$act, dta$duration) ) ) 
colnames(dta1) <- c('id', 'act')

这是dplyr 代码。

dta1 %>%
group_by(id) %>% 
mutate(Time = 1:n() ) %>%
spread(Time, act)

你知道如何将这两个代码放在一起吗?

数据

dta = structure(list(id = c("B10001N1", "B10001N1", "B10001N1", "B10001N1", 
                  "B10001N1", "B10001N1", "B10001N1", "B10001N1", "B10001N1", "B10001N1", 
                  "B10001N1", "B10001N1", "B10001N1", "B10001N1", "B10001N1", "B10001N1", 
                  "B10001N2", "B10001N2", "B10001N2", "B10001N2", "B10001N2", "B10001N2", 
                  "B10001N2", "B10001N2", "B10001N2", "B10001N2", "B10001N2", "B10001N2", 
                  "B10001N2", "B10001N3", "B10001N3", "B10001N3", "B10001N3", "B10001N3", 
                  "B10001N3", "B10001N3", "B10001N3", "B10001N3", "B10001N3", "B10001N3", 
                  "B10001N3", "B10001N3", "B10001N4", "B10001N4", "B10001N4", "B10001N4", 
                  "B10001N4", "B10001N4", "B10001N4", "B10001N4", "B10001N4", "B10001N4", 
                  "B10001N4", "B10001N4", "B10001N4"), act = c("-11", "1704", "1302", 
                                                               "1301", "1507", "603", "1301", "101", "502", "1704", "1507", 
                                                               "1404", "8888", "603", "1507", "101", "-11", "1302", "1301", 
                                                               "1507", "704", "101", "1704", "1704", "3102", "1002", "1704", 
                                                               "3101", "101", "-11", "1704", "1302", "1302", "1507", "603", 
                                                               "2902", "3201", "812", "1704", "1704", "3701", "101", "-11", 
                                                               "1302", "1301", "3101", "1001", "1507", "1006", "2101", "2902", 
                                                               "1704", "8888", "1704", "1302"), duration = c(30, 570, 5, 30, 
                                                                                                             25, 3, 12, 165, 30, 10, 5, 20, 70, 45, 180, 240, 570, 30, 30, 
                                                                                                             20, 25, 95, 70, 20, 20, 20, 60, 45, 435, 30, 30, 570, 90, 30, 
                                                                                                             15, 5, 40, 60, 240, 60, 30, 240, 600, 15, 45, 15, 75, 30, 150, 
                                                                                                             60, 30, 60, 210, 60, 90)), row.names = c(NA, 55L), class = "data.frame", .Names = c("id", 
                                                                                                                                                                                                 "act", "duration"))

【问题讨论】:

  • 为什么不只是dta[rep(1:nrow(dta), dta$duration), -3] %&gt;% ...
  • 太棒了 - 你能把它作为我可以关闭问题的答案吗?谢谢
  • splitstackshape/data.table(v1.9.5)dcast(setDT(expandRows(dta, 'duration'))[, Time := 1:.N, ,id], id~Time,value.var='act')类似的方法

标签: r dplyr rep


【解决方案1】:

试试

library(dplyr)
library(tidyr)
dta[rep(1:nrow(dta), dta$duration), -3] %>%
  group_by(id) %>% 
  mutate( Time = 1:n() ) %>%
  spread(Time, act)

【讨论】:

  • 很好的解决方案! +1。我对为什么 OP 的方法比建议的解决方案要快得多感到有些困惑。您介意提供见解吗?
  • 好点。由于更大的内存分配,似乎更具可读性,但在这种情况下效率较低。但是,我不知道 R 的东西是如何工作的。
  • 这种优雅的方法很慢,有点令人困惑。好在刚刚发现data.table上同样的子集运算要快很多。
猜你喜欢
  • 2021-04-17
  • 2016-11-04
  • 2021-09-24
  • 1970-01-01
  • 2019-11-10
  • 1970-01-01
  • 1970-01-01
  • 2016-12-21
  • 1970-01-01
相关资源
最近更新 更多