Stata：在单个数据集中组合来自多个回归的系数/标准误差（变量数量可能不同）答案

【问题标题】：Stata: combining coefficients/standard errors from several regressions in a single dataset (number of variables may differ)Stata：在单个数据集中组合来自多个回归的系数/标准误差（变量数量可能不同）
【发布时间】：2015-11-20 11:13:17
【问题描述】：

我已经向question 询问了有关在单个数据集中存储多个回归的系数和标准误的问题。

让我重申一下我最初问题的目标：

我想运行几个回归并将它们的结果存储在一个我以后可以用于分析的 DTA 文件。我的限制是：

我无法安装模块（我正在为其他人编写代码，而不是确定他们安装了哪些模块）

一些回归变量是因子变量。

每个回归仅在依赖项上有所不同变量，所以我想将其存储在最终数据集中以保留跟踪系数/方差对应的回归。

Roberto Ferrer 建议的解决方案在我的测试数据上运行良好，但在某些其他类型的数据上效果不佳。原因是我的样本从一个回归到下一个回归略有变化，并且某些因子变量在每个回归中取值的数量不同。这会导致固定效应（使用i.myvar 作为回归量动态创建）不具有相同的基数。

假设我决定使用i.year 设置年份固定效应（如：特定年份的截距），但在一个回归中没有观察到 2006 年。这意味着这个特定的回归将少一个回归量（不会创建对应于 year==2006 的虚拟对象），因此会生成一个较小的矩阵来存储系数。

这会在尝试将矩阵堆叠在一起时导致一致性错误。

我想知道是否有办法使初始解决方案对不同数量的回归变量具有鲁棒性。（也许将每个回归保存为 dta，然后合并？）

我仍然受到不能依赖外部包的约束。

【问题讨论】：

坦白说，字太多了！请给出具体的代码和可重现的例子。否则很可能会被判断为跑题。见stackoverflow.com/help/mcve

标签： matrix regression stata

【解决方案1】：

您可以按照appending datasets的策略，对您引用的问题中的代码进行小幅改动：

clear
set more off

save test.dta, emptyok replace

foreach depvar in marriage divorce {

    // test data
    sysuse census, clear 
    generate constant = 1
    replace marriage = . if region == 4 

    // regression
    reg `depvar' popurban i.region constant, robust noconstant  // regressions
    matrix result_matrix = e(b)\vecdiag(e(V))                   // grab coeffs and their variances in a 2xK matrix
    matrix rownames result_matrix = `depvar'_b `depvar'_v       // add rownames to the two extra rows

    // get original column names of matrix
    local names : colfullnames result_matrix

    // get original row names of matrix (and row count)
    local rownames : rowfullnames result_matrix
    local c : word count `rownames'

    // make original names legal variable names
    local newnames
    foreach name of local names {
        local newnames `newnames' `=strtoname("`name'")'
    }

    // rename columns of matrix
    matrix colnames result_matrix = `newnames'

    // from matrix to dataset
    clear
    svmat result_matrix, names(col)

    // add matrix row names to dataset
    gen rownames = ""
    forvalues i = 1/`c' {
        replace rownames = "`:word `i' of `rownames''" in `i'
    }

    // append
    append using "test.dta"
    save "test.dta", replace

}

// list
order rownames
list, noobs

结果就是你想要的。但是，问题是每次循环都重新加载数据集；它加载数据的次数与您估计的回归次数一样多。

您可能想查看post 并检查您是否可以管理更有效的解决方案。 statsby 也可以，但你需要找到一种聪明的方法来重命名存储的变量。

【讨论】：