在 R 中重塑大型数据集答案

【问题标题】：Reshaping large dataset in R在 R 中重塑大型数据集
【发布时间】：2015-02-16 14:27:10
【问题描述】：

我正在尝试重塑一个大型数据集，但无法按照我想要的正确顺序获得结果。

数据如下所示：

GeoFIPS GeoName IndustryID  Description X2001   X2002   X2003   X2004   X2005 
10180   Abilene, TX     21  Mining      96002   92407   127138 150449   202926
10180   Abilene, TX     22  Utilities   33588   34116   33105   33265   32452
...

数据框很长，包括美国所有具有选定行业部门的 MSA。

我希望它看起来像这样：

GeoFIPS GeoName        Year     Mining Utilities (etc)
10180   Abilene, TX    2001     96002   33588
10180   Abilene, TX    2002     92407   34116
....

我对 R 很陌生，非常感谢您的帮助。我检查了wide to long和long to wide，但这似乎是一个更复杂的情况。谢谢！

编辑：数据

df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX", 
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining", 
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 = 
c(202926L, 
 32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description", 
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
 row.names = c(NA, -2L))

【问题讨论】：

请考虑让您的问题可重现。如果您有可用数据，则没有理由不包含其中的一小部分或模拟一个小示例。

标签： r reshape reshape2

【解决方案1】：

您可以使用 melt/dcast 中的 reshape2

library(reshape2)
df2 <- melt(df1, id.var=c('GeoFIPS', 'GeoName', 
               'IndustryID', 'Description'))
df2 <- transform(df2, Year=sub('^X', '', variable))[-c(3,5)]


dcast(df2, ...~Description, value.var='value')
#  GeoFIPS     GeoName Year Mining Utilities
#1   10180 Abilene, TX 2001  96002     33588
#2   10180 Abilene, TX 2002  92407     34116
#3   10180 Abilene, TX 2003 127138     33105
#4   10180 Abilene, TX 2004 150449     33265
#5   10180 Abilene, TX 2005 202926     32452

数据

df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX", 
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining", 
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 = 
c(202926L, 
 32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description", 
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
 row.names = c(NA, -2L))

【讨论】：