【问题标题】:Merge 2 data frames with common columns and add indicator when common value is not NA合并 2 个具有公共列的数据框,并在公共值不为 NA 时添加指示符
【发布时间】:2021-04-12 06:36:24
【问题描述】:

我有 2 个数据框:

df_1:

  Date                time_series_1           time_series_2       
1  01-01-2019               NA                      10                      
2  02-01-2019               5                       NA                       
3  03-01-2019               10                      NA                          
4  04-01-2019               20                      6                                       

df_2:

  Date                time_series_1           time_series_2            time_series_3
1  01-01-2019               NA                      10                       10
2  02-01-2019               5                       NA                       87
3  03-01-2019               10                      NA                       45   
4  04-01-2019               20                      6                        221

两个数据框都有共同的列:time_series_1 和 time_series_2。 (df_1 中的所有列都包含在 df_2 中)

我的目标是合并这两个数据框,以长格式显示合并后的数据框,如果特定值 velongs 到 df_1 并且在特定日期不是 NA,则添加指示符。

所需的输出将是: DF_LONG_MERGED:

  Date                variable           value            indicator
1  01-01-2019      time_series_1          NA                  0
2  01-01-2019      time_series_2          10                  1
3  01-01-2019      time_series_3          10                  0  
4  02-01-2019      time_series_1          5                   1
5  02-01-2019      time_series_2          NA                  0            
6  02-01-2019      time_series_3          87                  0           
7  03-01-2019      time_series_1          10                  1            
8  03-01-2019      time_series_2          NA                  0 
9  03-01-2019      time_series_3          45                  0
10 04-01-2019      time_series_1          20                  1  
11 04-01-2019      time_series_2          6                   1 
12 04-01-2019      time_series_3          221                 0

关于如何添加这个指标有什么建议吗?

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    这行得通吗:

    library(dplyr)
    library(tidyr)
    
    df_1 %>% pivot_longer(-Date, names_to = 'variable') %>% mutate(indicator = case_when(!is.na(value) ~ 1, TRUE ~ 0)) %>% right_join(
    df_2 %>% pivot_longer(-Date, names_to = 'variable') 
    ) %>% mutate(indicator = replace_na(indicator, 0)) %>% arrange(Date)
    Joining, by = c("Date", "variable", "value")
    # A tibble: 12 x 4
       Date       variable      value indicator
       <chr>      <chr>         <int>     <dbl>
     1 01-01-2019 time_series_1    NA         0
     2 01-01-2019 time_series_2    10         1
     3 01-01-2019 time_series_3    10         0
     4 02-01-2019 time_series_1     5         1
     5 02-01-2019 time_series_2    NA         0
     6 02-01-2019 time_series_3    87         0
     7 03-01-2019 time_series_1    10         1
     8 03-01-2019 time_series_2    NA         0
     9 03-01-2019 time_series_3    45         0
    10 04-01-2019 time_series_1    20         1
    11 04-01-2019 time_series_2     6         1
    12 04-01-2019 time_series_3   221         0
    

    【讨论】:

      猜你喜欢
      • 2023-01-12
      • 1970-01-01
      • 2022-11-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-02-18
      • 1970-01-01
      • 2013-09-18
      相关资源
      最近更新 更多