【问题标题】:Dynamic CAGR calculation in R using dpylr使用 dplyr 在 R 中进行动态 CAGR 计算
【发布时间】:2018-08-05 15:36:39
【问题描述】:

我有以下数据:

 Company    Year    Variables    Data
  ABC        2000     Revenue     10
  ABC        2001     Revenue     15
  ABC        2002     Revenue     12
  ABC        2003     Revenue     25
  ABC        2004     Revenue     30
  CDE        2000     Revenue     5
  CDE        2001     Revenue     8
  CDE        2002     Revenue     17
  CDE        2003     Revenue     9
  CDE        2004     Revenue     34

  #etc

我想计算过去 3 年的复合年增长率 (CAGR)。

例如,每家公司的 3 年复合年增长率结果将是:

Company    Year    Variables    Data    CAGR
 ABC        2000     Revenue     10      NA
 ABC        2001     Revenue     15      NA
 ABC        2002     Revenue     12      6.27%
 ABC        2003     Revenue     25      18.56%
 ABC        2004     Revenue     30     35.72%
 CDE        2000     Revenue     5       NA
 CDE        2001     Revenue     8       NA
 CDE        2002     Revenue     17      50.37%
 CDE        2003     Revenue     9       4.00%
 CDE        2004     Revenue     34      25.99%

我按年份在数据中使用以下公式:

CAGR for 2004=((LastYear/PreviousYear)^(1/n))-1
For example for n = 2
LastYear =2004
PreviousYear =2004-2 = 2002

尝试使用 R 代码计算 2004 年与 2002 年的复合年增长率:

library(tibble)
library(dplyr)
library(lubridate)

year<-c(rep(2000:2004,2))
company<-rep(c("ABC","CDE"),5)
variable<-rep("revenue",10)
data<-c(10,15,12,25,30,5,8,17,9,34)

tibdf<-tibble(company,year,variable,data)
View(tibdf)

#revenue2004<-tibdf%>%filter(year==2004)%>%select(company,data)
#revenue2002<-tibdf%>%filter(year==2001)%>%select(company,data)

计算 CAGR(来自Plot Compound Annual Growth Rate (3 independent variables) in R

annual.growth.rate <- function(a){

 T1 <- max(a$year) - min(a$year)+1
 FV <- a[which(a$year == max(a$year)),"data"]
 SV <- a[which(a$year == min(a$year)),"data"]
 cagr <- ((FV/SV)^(1/T1)) -1

 }

将 tibdf 用于 in 函数。 不幸的是,我无法将函数应用于我的数据。

感谢您的帮助。

【问题讨论】:

    标签: r


    【解决方案1】:

    这是一种方法:

    library(tidyverse)
    df %>%
      arrange(Company, Year) %>%  #in case the years are not in order (here they are)
      group_by(Company) %>%
      mutate(lagY = lag(Year), #get the lag year
             lagD = lag(Data), #get lad Data
             t = Year - lagY, #calculate time
             CAGR = (Data / lagD)^(1/t) - 1) %>% #calculate CAGR
      select(-lagY, -lagD, -t) #remove unwanted variables
    
    
    #output:
          Company  Year Variables  Data    CAGR
       <fct>   <int> <fct>     <int>   <dbl>
     1 ABC      2000 Revenue      10  NA    
     2 ABC      2001 Revenue      15   0.500
     3 ABC      2002 Revenue      12 - 0.200
     4 ABC      2003 Revenue      25   1.08 
     5 ABC      2004 Revenue      30   0.200
     6 CDE      2000 Revenue       5  NA    
     7 CDE      2001 Revenue       8   0.600
     8 CDE      2002 Revenue      17   1.12 
     9 CDE      2003 Revenue       9 - 0.471
    10 CDE      2004 Revenue      34   2.78 
    

    或者在不做中间变量的情况下稍微密集一点:

       df %>%
          arrange(Company, Year) %>%
          group_by(Company) %>%
          mutate(CAGR = (Data/lag(Data))^(1/(Year-lag(Year))) - 1)
    

    数据:

    df <- read.table(text ="Company    Year    Variables    Data
    ABC        2000     Revenue     10
    ABC        2001     Revenue     15
    ABC        2002     Revenue     12
    ABC        2003     Revenue     25
    ABC        2004     Revenue     30
    CDE        2000     Revenue     5
    CDE        2001     Revenue     8
    CDE        2002     Revenue     17
    CDE        2003     Revenue     9
    CDE        2004     Revenue     34", header = T)
    

    【讨论】:

    • 感谢您的帮助。您的代码有所帮助,但它不计算 2004 和 20022003 和 2001 或等之间的 CAGR。对于 2004 年至 2002 年,提前期为 2004 年,滞后年为2002. 所以我们用 lag 代替?
    • @你是对的,计算出来的 CAGR 实际上是偏移了 +1 我会更新答案
    【解决方案2】:

    这个函数计算n的不同值的CAGR:

    calc_cagr <- function(df, n) {
      df <- df %>%
        arrange(company, year) %>%
        group_by(company) %>%
        mutate(cagr = ((data / lag(data, n)) ^ (1 / n)) - 1)
    
      return(df)
    }
    
    calc_cagr(tibdf, 2)
    
    # A tibble: 10 x 5
    # Groups:   company [2]
    #    company  year variable  data    cagr
    #    <chr>   <int> <chr>    <dbl>   <dbl>
    #  1 ABC      2000 revenue  10.0  NA     
    #  2 ABC      2001 revenue  15.0  NA     
    #  3 ABC      2002 revenue  12.0   0.0954
    #  4 ABC      2003 revenue  25.0   0.291 
    #  5 ABC      2004 revenue  30.0   0.581 
    #  6 CDE      2000 revenue   5.00 NA     
    #  7 CDE      2001 revenue   8.00 NA     
    #  8 CDE      2002 revenue  17.0   0.844 
    #  9 CDE      2003 revenue   9.00  0.0607
    # 10 CDE      2004 revenue  34.0   0.414 
    

    然而,我得到的结果与你不同,但你的问题对于是除以 n 还是 n+1 有点模棱两可。

    数据

    tibdf <- tibble(company = rep(c("ABC", "CDE"), each = 5),
                    year = rep(2000:2004, 2),
                    variable = rep("revenue", 10),
                    data = c(10, 15, 12, 25, 30, 5, 8, 17, 9, 34))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-12-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多