将值列传播到 R 中的二进制“时间序列”答案

【问题标题】：Spread valued column into binary 'time series' in R将值列传播到 R 中的二进制“时间序列”
【发布时间】：2020-02-06 06:48:38
【问题描述】：

我试图首先将一个有价值的列传播到一组二进制列中，然后以“时间序列”格式再次收集它们。

例如，考虑在特定时间被征服的位置，数据如下所示：

df1 <- data.frame(locationID = c(1,2,3), conquered_in = c(1931, 1932, 1929))

  locationID conquered_in
1          1         1931
2          2         1932
3          3         1929

我正在尝试将数据重塑为如下所示：

df2 <- data.frame(locationID = c(1,1,1,1,2,2,2,2,3,3,3,3), year = c(1929,1930,1931,1932,1929,1930,1931,1932,1929,1930,1931,1932), conquered = c(0,0,1,1,0,0,0,0,1,1,1,1))

   locationID year conquered
1           1 1929         0
2           1 1930         0
3           1 1931         1
4           1 1932         1
5           2 1929         0
6           2 1930         0
7           2 1931         0
8           2 1932         0
9           3 1929         1
10          3 1930         1
11          3 1931         1
12          3 1932         1

我最初的策略是在被征服时使用spread，然后尝试使用gather。 This answer 似乎很接近，但我似乎无法用 fill 正确处理它，因为我也在尝试用 1 填充晚年。

【问题讨论】：

标签： r spread

【解决方案1】：

您可以使用complete()扩展数据框，然后在conquered等于1时使用cumsum()向下填充分组数据。

library(tidyr)
library(dplyr)

df1 %>% 
  mutate(conquered = 1) %>%
  complete(locationID, conquered_in = seq(min(conquered_in), max(conquered_in)), fill = list(conquered = 0)) %>%
  group_by(locationID) %>%
  mutate(conquered = cumsum(conquered == 1))

# A tibble: 12 x 3
# Groups:   locationID [3]
   locationID conquered_in conquered
        <dbl>        <dbl>     <int>
 1          1         1929         0
 2          1         1930         0
 3          1         1931         1
 4          1         1932         1
 5          2         1929         0
 6          2         1930         0
 7          2         1931         0
 8          2         1932         1
 9          3         1929         1
10          3         1930         1
11          3         1931         1
12          3         1932         1

【讨论】：

这太好了，谢谢。有没有办法做到这一点，如果您在数据框中有其他列，它们不会转向NA？
@dmk32 - 包括包装在nesting() - complete(nesting(locationID, x, y, z), conquered_in = seq(min(conquered_in), max(conquered_in)), fill = list(conquered = 0)) 中的其他变量。

【解决方案2】：

使用完整的 tidyr 会是更好的选择。虽然我们需要注意的是，被征服的年份可能无法完全涵盖从战争开始到结束的全年。

library(dplyr)
library(tidyr)
library(magrittr)

df1 <- data.frame(locationID = c(1,2,3), conquered_in = c(1931, 1932, 1929))

# A data frame full of all year you want to cover
df2 <- data.frame(year=seq(1929, 1940, by=1))

# Create a data frame full of combination of year and location + conquered data
df3 <- full_join(df2, df1, by=c("year"="conquered_in")) %>%
  mutate(conquered=if_else(!is.na(locationID), 1, 0)) %>%
  complete(year, locationID) %>%
  arrange(locationID) %>%
  filter(!is.na(locationID))

# calculate conquered depend on the first year it get conquered - using group by location
df3 %<>%
  group_by(locationID) %>%
  # year 2000 in the min just for case if you have location that never conquered 
  mutate(conquered=if_else(year>=min(2000, year[conquered==1], na.rm=T), 1, 0)) %>%
  ungroup()

df3 %>% filter(year<=1932)
# A tibble: 12 x 3
    year locationID conquered
   <dbl>      <dbl>     <dbl>
 1  1929          1         0
 2  1930          1         0
 3  1931          1         1
 4  1932          1         1
 5  1929          2         0
 6  1930          2         0
 7  1931          2         0
 8  1932          2         1
 9  1929          3         1
10  1930          3         1
11  1931          3         1
12  1932          3         1

【讨论】：