【问题标题】:Compute month on month difference in weights计算每月的权重差异
【发布时间】:2021-02-18 15:33:17
【问题描述】:

我正在处理一些投资组合数据,但我对这种数据操作感到困惑。我有这个样本数据

df <- tibble(
  date = as.Date(c("2020-01-31", "2020-01-31", "2020-01-31", 
                   "2020-02-29", "2020-02-29", "2020-02-29",
                   "2020-03-31", "2020-03-31", "2020-03-31") ),
  id = c("KO", "AAPL", "MSFT",
         "KO", "AAPL", "GOOG", 
         "KO", "AAPL", "MSFT"),
  weight = c(0.3, 0.4, 0.3,
             0.5, 0.3, 0.2,
             0.6, 0.2, 0.2),
  
  `weight_change (desired column)` = c(NA, NA, NA,
                                       0.2, -0.1, 0.2,
                                       0.1, -0.1, 0.2)
) 

这些是示例投资组合中的职位。投资组合每个月都会获得新的权重。我要计算的是每个项目的重量变化在前几个月的重量方面。在这个例子中,我们看到在 2 月底,KO 的当前权重为 0.5,比上个月增加了 0.2。 AAPL 下降了 0.1,而 GOOG 取代了 MSFT,因此与上个月的变化是其当前的全部权重:0.2。如何设置变异,使其查找前一个日期的股票并计算权重之间的差异?

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    如果每个 'id' 的数据是每月的,我们可以做一个complete 来考虑缺失的月份,然后通过diff 做一个分组

    library(dplyr)
    library(tidyr)
    library(zoo)    
    df %>%
         mutate(yearmonth = as.Date(as.yearmon(date))) %>%
         group_by(id) %>% 
         complete(yearmonth = seq(first(yearmonth), last(yearmonth), by = '1 month')) %>%
         mutate(weight_change = if(n() == 1) weight else c(NA, diff(replace_na(weight, 0)))) %>%
         ungroup %>%
         select(names(df), weight_change) %>%
         filter(!is.na(date))
    # A tibble: 9 x 5
    #  date       id    weight `weight_change (desired column)` weight_change
    #  <date>     <chr>  <dbl>                            <dbl>         <dbl>
    #1 2020-01-31 AAPL     0.4                             NA          NA    
    #2 2020-02-29 AAPL     0.3                             -0.1        -0.1  
    #3 2020-03-31 AAPL     0.2                             -0.1        -0.100
    #4 2020-02-29 GOOG     0.2                              0.2         0.2  
    #5 2020-01-31 KO       0.3                             NA          NA    
    #6 2020-02-29 KO       0.5                              0.2         0.2  
    #7 2020-03-31 KO       0.6                              0.1         0.100
    #8 2020-01-31 MSFT     0.3                             NA          NA    
    #9 2020-03-31 MSFT     0.2                              0.2         0.2  
    

    【讨论】:

      【解决方案2】:

      这是我不那么紧凑的解决方案。我只是使用了一些辅助列,我将其保留以便人们可以跟进。

      library(tidyverse)
      library(lubridate)
      
      df <- tibble(
        date = c("2020-01-31", "2020-01-31", "2020-01-31", 
                         "2020-02-29", "2020-02-29", "2020-02-29",
                         "2020-03-31", "2020-03-31", "2020-03-31"),
        id = c("KO", "AAPL", "MSFT", "KO", "AAPL", "GOOG", "KO", "AAPL", "MSFT"),
        weight = c(0.3, 0.4, 0.3, 0.5, 0.3, 0.2, 0.6, 0.2, 0.2),
        `weight_change (desired_column)` = c(NA, NA, NA, 0.2, -0.1, 0.2, 0.1, -0.1, 0.2)
      ) %>% #new code starts here
        mutate(
          date = as_date(date),
          date_ym = floor_date(date,
                               unit = "month"))%>%
        group_by(id)%>%
        arrange(date)%>%
        mutate(id_n = row_number(),
               prev_exist = case_when(lag(date_ym) == date_ym - months(1) ~ "immediate month", #if there is an immediate month
                                      id_n == 1 & date != min(df$date)~ "new month", #if this is a new month
                                      TRUE ~ "no immediate month"),
               weight_change = case_when(prev_exist == "new month"~ weight,
                                         prev_exist == "no immediate month" & id_n > 1~ weight,
                                         TRUE ~ weight-lag(weight)),
               date_ym = NULL,
               id_n  = NULL,
               prev_exist = NULL)
      

      【讨论】:

        【解决方案3】:

        timetk 方法:

        library(timetk)
        df %>% 
           mutate(Month = lubridate::floor_date(date, "month")) %>%
           group_by(id) %>% 
           timetk::pad_by_time(.date_var = Month, .by="month") %>% 
           select(-Month) %>% 
           mutate(WC = if(n() == 1) weight else c(NA, diff(weight)))
        
        A tibble: 10 x 5
        Groups:   id [4]
           id    date       weight weight_change     WC
           <chr> <date>      <dbl>         <dbl>  <dbl>
         1 KO    2020-01-31    0.3          NA   NA    
         2 KO    2020-02-29    0.5           0.2  0.2  
         3 KO    2020-03-31    0.6           0.1  0.100
         4 AAPL  2020-01-31    0.4          NA   NA    
         5 AAPL  2020-02-29    0.3          -0.1 -0.1  
         6 AAPL  2020-03-31    0.2          -0.1 -0.100
         7 MSFT  2020-01-31    0.3          NA   NA    
         8 MSFT  NA           NA            NA   NA    
         9 MSFT  2020-03-31    0.2           0.2 NA    
        10 GOOG  2020-02-29    0.2           0.2  0.2
        

        【讨论】:

          猜你喜欢
          • 2022-01-17
          • 2023-02-08
          • 1970-01-01
          • 1970-01-01
          • 2020-09-26
          • 2018-04-24
          • 1970-01-01
          • 1970-01-01
          • 2013-07-10
          相关资源
          最近更新 更多