考虑日期范围，在 R 中创建从长到宽格式的时间序列列答案

【问题标题】：Creating Time Series columns in R from Long to Wide format considering Date Range考虑日期范围，在 R 中创建从长到宽格式的时间序列列
【发布时间】：2019-02-20 07:00:09
【问题描述】：

首先，我已成功地将我的数据从长格式转换为宽格式。数据如下。

+======+==========+======+======+
| Name |   Date   | Val1 | Val2 |
+======+==========+======+======+
| A    | 1/1/2018 |    1 |    2 |
+------+----------+------+------+
| B    | 1/1/2018 |    2 |    3 |
+------+----------+------+------+
| C    | 1/1/2018 |    3 |    4 |
+------+----------+------+------+
| D    | 1/4/2018 |    4 |    5 |
+------+----------+------+------+
| A    | 1/4/2018 |    5 |    6 |
+------+----------+------+------+
| B    | 1/4/2018 |    6 |    7 |
+------+----------+------+------+
| C    | 1/4/2018 |    7 |    8 |
+------+----------+------+------+

为了将上表从长格式转换为宽格式，我使用了以下代码行：

test_wide <- reshape(test_data, idvar = 'Name', timevar = 'Date', direction = "wide" )

以上代码的结果如下：

+======+===============+===============+===============+===============+
| Name | Val1.1/1/2018 | Val2.1/1/2018 | Val1.1/4/2018 | Val2.1/4/2018 |
+======+===============+===============+===============+===============+
| A    | 1             | 2             |             5 |             6 |
+------+---------------+---------------+---------------+---------------+
| B    | 2             | 3             |             6 |             7 |
+------+---------------+---------------+---------------+---------------+
| C    | 3             | 4             |             7 |             8 |
+------+---------------+---------------+---------------+---------------+
| D    | NA            | NA            |             4 |             5 |
+------+---------------+---------------+---------------+---------------+

我面临的问题是我需要 R 以日期格式考虑 Date 列。日期列的范围从1/1/2018 到1/4/2018，因为日期1/2/2018 和1/3/2018 中没有值我不会看到任何列如Val1.1/2/2018、Val2.1/3/2018、Val3.1/2/2018 和Val3.1/3/2018。

我想转换成宽格式，这样我就可以获得日期1/2/2018 和1/3/2018 的列，即使这些列只包含NULL。

这样做的原因是我需要将数据用作时间序列。

编辑：

复制和粘贴的初始数据：

Name Date Val1 Val2
A 1/1/2018 1 2
B  1/1/2018 2 3
C 1/1/2018 3 4
D 1/4/2018 4 5
A 1/4/2018 5 6
B  1/4/2018 6 7
C 1/4/2018 7 8
", header=TRUE)

转换后的数据复制粘贴：

Name,Val1.1/1/2018,Val2.1/1/2018,Val1.1/4/2018,Val2.1/4/2018
A,1,2,5,6
B,2,3,6,7
C,3,4,7,8
D,NA,NA,4,5

dput(test_data) 结果：

structure(list(Name = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("A", 
"B ", "C", "D"), class = "factor"), Date = structure(c(1L, 1L, 
1L, 2L, 2L, 2L, 2L), .Label = c("1/1/2018", "1/4/2018"), class = "factor"), 
    Val1 = 1:7, Val2 = 2:8), class = "data.frame", row.names = c(NA, 
-7L))

【问题讨论】：

请以复制粘贴格式或使用dput提供数据。同时，见tidyr::complete
正如@A.Suliman 建议的那样，您应该使用dput 的输出来发帖，这样人们会更容易帮助您。例如dput(test_data).
@steveb 和 @A.Suliman 我在Edit 部分添加了dput(test_data) 结果。我希望这就是您所需要的。
有人可以建议一种我应该查找的方法以在 python 中复制相同的方法吗？

标签： r time-series reshape wide-column-store

【解决方案1】：

tidyverse 选项

library(lubridate)
library(tidyverse)

df %>% 
  mutate(Date=mdy(Date)) %>% 
  #Or you can do as.Date(Date,'%m/%d/%Y') to avoid loading `lubridate`
  complete(Name, Date = seq(min(Date), max(Date), 1)) %>%
  gather(key, value, -Name, -Date) %>%
  unite(Date, key, Date, sep = ".") %>%
  spread(Date, value)

【讨论】：

【解决方案2】：

library(dplyr)
library(tidyr) #complete
library(data.table) #dcast and setDT
df %>% mutate(Date=as.Date(Date,'%m/%d/%Y')) %>% 
       complete(Name, nesting(Date=full_seq(Date,1))) %>%
       setDT(.) %>% dcast(Name ~ Date, value.var=c('Val2','Val1'))

【讨论】：