按平均变化百分比推断每组的缺失数据答案

【问题标题】：Extrapolate missing data for each group by average percentage of change按平均变化百分比推断每组的缺失数据
【发布时间】：2017-03-28 12:21:06
【问题描述】：

我有一个数据框，其中包含 2010-2014 年按邮政编码划分的平均收入。我想要 2015-2017 年的数据，所以我正在寻找一种方法来根据每个邮政编码组在可用年份的年平均变化来推断这一点。

例如：

year  zip   income
2010  1111   5000
2011  1111   5500
2012  1111   6000
2013  1111   6500
2014  1111   7000
2010  2222   5000
2011  2222   6000
2012  2222   7000
2013  2222   8000
2014  2222   9000

应该（大致）有：

year  zip   income
2010  1111   5000
2011  1111   5500
2012  1111   6000
2013  1111   6500
2014  1111   7000
2015  1111   7614
2016  1111   8282
2017  1111   9009
2010  2222   5000
2011  2222   6000
2012  2222   7000
2013  2222   8000
2014  2222   9000
2015  2222   10424
2016  2222   12074
2017  2222   13986

基于邮政编码 1111 的平均增长率为 8.78%，邮政编码 2222 的平均增长率为 15.83%。

【问题讨论】：

试试?approx 和method="linear"。
虽然我不知道这个函数，但这并没有让我走得太远，因为我无法在数据框中创建新的年份，并根据组进行插值。
很遗憾听到这个消息。

标签： r percentage extrapolation

【解决方案1】：

这是一个非常快速的混乱 data.table 想法

library(data.table)

#Create data
last_year <- 2014 
dt <- data.table(year=rep(2010:last_year,2),
             zip=c(rep(1111,5),rep(2222,5)),
             income=c(seq(5000,7000,500),seq(5000,9000,1000)))

#Future data
dt_fut <- data.table(year=rep((last_year+1):2017,2),
           zip=c(rep(1111,3),rep(2222,3)),
           income=rep(NA_integer_,6))

#calculate mean percentage change per year
dt[,avg_growth:=mean(diff(log(income))),by=zip]
#bind old with future data
dt <- rbindlist(list(dt,dt_fut),fill=T);setorder(dt,zip,year)

#carry last value forward replace NA 
dt[,avg_growth:=na.locf(avg_growth),by=zip][,income:=na.locf(income),by=zip]

#calculate
# after 2014+1 (2015) then replace income 
# with income*cumulative product of the average growth (1+r)-1
dt[year>=last_year+1,income:=income*cumprod(1+avg_growth)-1,by=zip][]

【讨论】：