【发布时间】:2018-10-08 07:50:14
【问题描述】:
我有以下数据框:
df <- structure(list(a = c(1, 43, 22, 12, 35, 113, 54, 94), b = c("a",
"b", "c", "d", "e", "f", "g", "h")), .Names = c("a", "b"), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
我想从这些数据中选择一定长度的连续子序列。例如,对于长度为 2 的序列,我想选择第 1-2、2-3、3-4 行,依此类推,直到数据帧的最后一行。然后应标记每个子序列。
如果子序列长度为 2,带有序列标签的新 df 将如下所示:
a b seq_label
1 a 1 # First subsequence, row 1-2
43 b 1 #
43 b 2 # Second subsequence, row 2-3
22 c 2 #
22 c 3 # Third subsequence, row 3-4
12 d 3 #
12 d 4
35 e 4
35 e 5
113 f 5
113 f 6
54 g 6
54 g 7
94 h 7
类似,子序列长度为 3:
a b seq_label
1 a 1 # First subsequence, row 1-3
43 b 1 #
22 c 1 #
43 b 2 # Second subsequence, row 2-4
22 c 2 #
12 d 2 #
22 c 3 # Third subsequence, row 3-5
12 d 3 #
35 e 3 #
12 d 4
35 e 4
113 f 4
35 e 5
113 f 5
54 g 5
113 f 6
54 g 6
94 h 6
....
感谢@drjones 的建议回答我已经改进了解决方案:
map_dfr(1:(nrow(df) - n + 1), function (i) {cbind(df[i:(i + n - 1), ], "seq_label" = i)})
【问题讨论】: