【发布时间】:2018-09-26 17:10:47
【问题描述】:
我正在使用 R 并遵循 Anthony Damico(@AnthonyDamico) 的代码 (http://asdfree.com/survey-of-consumer-finances-scf.html) 来计算来自消费者金融调查的分位数。代码如下:
## Load necessary libraries (Note: should be installed in the OS)
library(mitools) # allows analysis of multiply-imputed survey data
library(survey) # load survey package (analyzes complex design surveys)
library(downloader) # downloads and then runs the source() function on scripts from github
library(foreign) # load foreign package (converts data files into R)
library(Hmisc) # load Hmisc package (loads a simple wtd.quantile function)
library(convey)
library(lodown)
library(devtools)
library(srvyr)
## Read SCF data: wave 2016, path.expand("~") is default working directory
scf_imp <- readRDS("scf 2016.rds" )
scf_rw <- readRDS("scf 2016 rw.rds" )
# define which variables from the five imputed iterations to keep
vars.to.keep <- c( 'y1' , 'yy1' , 'wgt' , 'one' , 'houses', 'oresre', 'mrthel', 'liq', 'income', 'age' )
# restrict each `scf_imp#` data frame to only those variables
scf_imp[[1]] <- scf_imp[[1]][ , vars.to.keep ]
scf_imp[[2]] <- scf_imp[[2]][ , vars.to.keep ]
scf_imp[[3]] <- scf_imp[[3]][ , vars.to.keep ]
scf_imp[[4]] <- scf_imp[[4]][ , vars.to.keep ]
scf_imp[[5]] <- scf_imp[[5]][ , vars.to.keep ]
# clear up RAM
gc()
# turn off scientific notation in most output
options( scipen = 20 )
scf_design <-
svrepdesign(
weights = ~wgt ,
repweights = scf_rw[ , -1 ] ,
data = imputationList( scf_imp ) ,
scale = 1 ,
rscales = rep( 1 / 998 , 999 ) ,
mse = TRUE ,
type = "other" ,
combined.weights = TRUE
)
scf_design <-
update(
scf_design,
wealth = liq + houses + oresre - mrthel,
hwealth = houses + oresre,
htoinc = (houses + oresre)/income,
ltoinc = mrthel / income,
ltv = mrthel + hwealth,
wtoinc = wealth + income
)
### gini coefficient (whole sample) ###
scf_design$designs <- lapply( scf_design$designs , convey_prep )
giniindex <- scf_MIcombine( with( scf_design , svygini( ~ wealth) ) )
在这段代码之后,我尝试了几种方法,包括1):
q1.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.01,
method='constant',interval.type='quantile',se=TRUE)))
q5.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.05, method='constant',interval.type='quantile',se=TRUE)))
q20.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.20, method='constant',interval.type='quantile',se=TRUE)))
q40.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.40, method='constant',interval.type='quantile',se=TRUE)))
q60.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.60, method='constant',interval.type='quantile',se=TRUE)))
q80.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.80, method='constant',interval.type='quantile',se=TRUE)))
q90.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.90, method='constant',interval.type='quantile',se=TRUE)))
q95.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.95, method='constant',interval.type='quantile',se=TRUE)))
q99.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.99, method='constant',interval.type='quantile',se=TRUE)))
maxi.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 1, method='constant',interval.type='quantile',se=TRUE)))
和下面的方法2):
quantile.wealth <- scf_MIcombine(with(scf_design,svyquantile(~networth, c(0.01,0.2,0.4,0.6,0.8,0.9,0.99,1), method='constant',interval.type='quantile',se=TRUE)))
方法1)给我号码,但不准确。首先,它给出了以下警告信号:
Warning messages:
1: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
2: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
3: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
4: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
5: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
这说,标准误差不准确......所以我猜平均值也不值得信赖......(不确定)。我尝试使用方法 2),这只是方法 1) 的一个小调整,使用调查包参数 c[ , ] 在单个命令中计算分位数。然后我收到以下错误。
Error in (1 + 1/m) * evar/vbar : non-conformable arrays
In addition: Warning messages:
1: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
2: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
3: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
4: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
5: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
(ADDITION) 方法 3) 我使用调查包直接引用 (https://github.com/cran/survey/blob/master/man/svyquantile.Rd) 并尝试计算如下分位数:
q <- svyquantile(~networth,scf_design, c(.25,.5,.75),ci=TRUE)
但我收到以下错误。
Error in UseMethod("svyquantile", design) :
no applicable method for 'svyquantile' applied to an object of class
"svyimputationList"
长话短说,我需要以正确的方式计算分位数和基尼系数,以便我可以将数字与模型进行比较。有没有办法更正代码,或者使用其他方法?
请记住,这是数据帧的插补列表,由 5 个数据帧组成。而且......值得注意的是,数据帧不包含相同数量的样本(行)。
【问题讨论】:
-
asdfree 页面上的代码与美联储推荐的计算相匹配,因此如果您坚持那里显示的平均值/中位数/分位数计算,您的数字将是正确的。你添加了
method='constant',interval.type='quantile',但不知道为什么 -
@AnthonyDamico 嗨,那部分,我只是按照您在此处指定的代码github.com/ajdamico/asdfree/blob/archive/…
-
是的,你是对的。应该可以安全使用