重塑后携带字符串变量的字符串标签答案

【问题标题】：Carrying string labels of string variable after reshape重塑后携带字符串变量的字符串标签
【发布时间】：2023-03-16 06:41:02
【问题描述】：

我在 Stata 中有如下所示的数据集

entityID    indicator    indicatordescr    indicatorvalue
1           gdp          Gross Domestic    100
1           pop          Population        15
1           area         Area              50
2           gdp          Gross Domestic    200
2           pop          Population        10
2           area         Area              300

indicator 的值和indicatordescr 的值之间存在一对一的映射关系。

我想把它改成宽的，即：

entityID    gdp     pop     area
1           100     15      50
2           200     10      300

我希望gdp 变量标签为“国内生产总值”、pop 标签“人口”和area“面积”。

不幸的是，据我了解，无法将indicatordescr 的值分配为indicator 的值标签，因此reshape 无法将这些值标签转换为变量标签。

我看过这个：Bring value labels to variable labels when reshaping wide

还有这个：http://www.stata.com/support/faqs/data-management/apply-labels-after-reshape/

但不明白如何将这些应用到我的案例中。

注意：reshape 后的变量标记必须以编程方式完成，因为indicator 和indicatordescr 有很多值。

【问题讨论】：

标签： stata reshape

【解决方案1】：

这里的“字符串标签”是非正式的； Stata 不支持字符串变量的值标签。然而，这里想要的是字符串变量的不同值在重塑时成为变量标签。

存在各种变通方法。这是一个：将信息放入变量名中，然后再次取出。

clear 
input entityID  str4 indicator   str14 indicatordescr    indicatorvalue
1           gdp          "Gross Domestic"    100
1           pop          "Population"        15
1           area         "Area"              50
2           gdp          "Gross Domestic"    200
2           pop          "Population"        10
2           area         "Area"              300
end 

gen what = indicator + "_"  + subinstr(indicatordescr, " ", "_", .)  
keep entityID what indicatorvalue 
reshape wide indicatorvalue , i(entityID) j(what) string 

foreach v of var indicator* {
    local V : subinstr local v "_" " ", all
    local new : word 1 of `V' 
    rename `v' `new'
    local V = substr("`V'", strpos("`V'", " ") + 1, .)
    label var `new' "`V'"
}

renpfix indicatorvalue

编辑如果变量名的长度很长，请尝试另一种解决方法：

clear 
input entityID  str4 indicator   str14 indicatordescr    indicatorvalue
1           gdp          "Gross Domestic"    100
1           pop          "Population"        15
1           area         "Area"              50
2           gdp          "Gross Domestic"    200
2           pop          "Population"        10
2           area         "Area"              300
end 

mata : sdata = uniqrows(st_sdata(., "indicator indicatordescr")) 
keep entityID indicator indicatorvalue 
reshape wide indicatorvalue , i(entityID) j(indicator) string 
renpfix indicatorvalue 
mata : for(i = 1; i <= rows(sdata); i++) stata("label var " + sdata[i, 1] + "  " + char(34) + sdata[i,2] + char(34))
end

后期编辑虽然上述方法被称为变通方法，但它是比以前更好的解决方案。

【讨论】：

谢谢你这确实有效。但是，值得注意的是，这样做可以防止携带长标签。上面的算法需要使用变量名来携带变量+标签信息。并且只有当变量名称为 32 个字符或更少时，reshape 才会成功。有没有办法使用本地宏来克服这个缺点？
快速提问：算法运行后是否需要从内存中清除 mata 或 sdata，如果需要，如何清除？
我无法从这里看到您的数据集，但 sdata 故意将其作为 small 名称和变量标签表（uniqrows() 应确保那）。把它放在那里没有害处。
在我的数据集的这个实例中，sdata 将非常小，但可能并非所有人都这样。而且我确实认为，一旦不再需要潜在的内存占用，就删除它们是一种很好的做法。如何从内存中删除sdata 和mata？
你有多少个变量？即使是 1000 也不意味着不重要的存储需求。但可以肯定的是，公平点，在帮助的时刻告诉你mata clear 和mata drop。无论如何，Mata 的代码始终存在。