【问题标题】:R calculating yearly or normalized growth rates for different interval lengthsR计算不同间隔长度的年增长率或标准化增长率
【发布时间】:2015-06-22 19:40:26
【问题描述】:

我有一个如下所示结构的数据框。我想计算年增长率。问题是并非所有模型的时间步长都是相同的。在下面的示例中,REMIND 以 5 年为间隔提供数据,而 TIAM-ECN 以 10 年为间隔。

model     scenario  region  year  value
REMIND    Base  NORTH_AM    2010  314.1330
REMIND    Base    CHINA+    2010  1325.9220
REMIND    RefPol  NORTH_AM  2010  314.1330
REMIND    RefPol  CHINA+    2010  1325.9220
TIAM-ECN  Base    NORTH_AM  2010  344.4005
TIAM-ECN  Base    CHINA+    2010  1341.3352
TIAM-ECN  RefPol  NORTH_AM  2010  344.4005
TIAM-ECN  RefPol  CHINA+    2010  1341.3352
REMIND    Base    NORTH_AM  2015  327.6270
REMIND    Base    CHINA+    2015  1354.3180
REMIND    RefPol  NORTH_AM  2015  327.6270
REMIND    RefPol  CHINA+    2015  1354.3180
REMIND    Base    NORTH_AM  2020  340.8490
REMIND    Base    CHINA+    2020  1372.4630
REMIND    RefPol  NORTH_AM  2020  340.8490
REMIND    RefPol  CHINA+    2020  1372.4630
TIAM-ECN  Base    NORTH_AM  2020  374.2647
TIAM-ECN  Base    CHINA+    2020  1387.7915
TIAM-ECN  RefPol  NORTH_AM  2020  374.2647
TIAM-ECN  RefPol  CHINA+    2020  1387.7915

计算不同区间的增长率很简单:

library(dplyr)

tmp_gr <- group_by(df, model, scenario, region) %>%
  mutate(value = value / lag(value) - 1) %>%
  ungroup()

产量(我在 2010 年省略了 NA):

model     scenario region   year    value
REMIND    Base     NORTH_AM 2015    -0.7557456
REMIND    Base     CHINA+   2015    3.1337191
REMIND    RefPol   NORTH_AM 2015    -0.7580871
REMIND    RefPol   CHINA+   2015    3.1337191
REMIND    Base     NORTH_AM 2020    -0.7483242
REMIND    Base     CHINA+   2020    3.0266012
REMIND    RefPol   NORTH_AM 2020    -0.7516516
REMIND    RefPol   CHINA+   2020    3.0266012
TIAM-ECN  Base     NORTH_AM 2020    -0.7273044
TIAM-ECN  Base     CHINA+   2020    2.7080483
TIAM-ECN  RefPol   NORTH_AM 2020    -0.7303164
TIAM-ECN  RefPol   CHINA+   2020    2.7080483

但是现在,通过将区间增长率除以区间长度来计算年增长率

tmp_gr_yearly <- group_by(df, model, scenario, region) %>%
  mutate(value = (value / lag(value) - 1) / (year - lag(year))) %>%
  ungroup()

产量:

model     scenario region   year   value
REMIND    Base     NORTH_AM 2015    -0.1511491
REMIND    Base     CHINA+   2015    Inf
REMIND    RefPol   NORTH_AM 2015    -Inf
REMIND    RefPol   CHINA+   2015    Inf
REMIND    Base     NORTH_AM 2020    -0.1496648
REMIND    Base     CHINA+   2020    Inf
REMIND    RefPol   NORTH_AM 2020    -Inf
REMIND    RefPol   CHINA+   2020    Inf
TIAM-ECN  Base     NORTH_AM 2020    -Inf
TIAM-ECN  Base     CHINA+   2020    Inf
TIAM-ECN  RefPol   NORTH_AM 2020    -Inf
TIAM-ECN  RefPol   CHINA+   2020    Inf

我不明白Inf 来自哪里。

有什么想法吗?

【问题讨论】:

    标签: r dataframe dplyr


    【解决方案1】:

    我计算简单、非标准化增长率的例子已经错了。

    无论如何,我想我自己想通了:

    tmp_gr <- group_by(df, model, scenario, region) %>%
      mutate(value = lag(value, n=0, order_by=year) / lag(value, order_by=year) - 1) %>%
      ungroup()
    
     tmp_gr_yearly <- group_by(df, model, scenario, region) %>%
       mutate(value = (lag(value, n=0, order_by=year) / lag(value, order_by=year) - 1) / (lag(year, n=0, order_by=year) - lag(year, order_by=year))) %>%
     ungroup()
    

    通过对所有值使用滞后运算符并明确告知顺序,整个事情对无序数据变得稳健。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-12-12
      相关资源
      最近更新 更多