【发布时间】:2018-07-27 23:42:16
【问题描述】:
我正在尝试通过分组运行循环以获得最佳拟合模型。我已经到了我似乎无法让循环单独运行的地步 - 它按预期循环并输出几个 csv,但每个文件中的数据是相同的:
library(leaps)
library(dplyr)
#data
df = data.frame(matrix(rnorm(80), nrow=10))
df$state <- c('AL','AK','AR','AZ','CT')
state_list <- c('AL','AK','AR','AZ','CT')
for (state in state_list){
data_filter <- subset(df, state = state)
data_filter_u <- data_filter[c(1,2,3,4,5,6,7,8,9)]
data_sub <- regsubsets(X8~., data_filter_u, nvmax = 8)
data_summary <- summary(data_sub)
data_coef <- coef(data_sub,which.max(data_summary$adjr2))
as.data.frame(t(data_coef))
data_coef$state_used <- state
write.csv(data_coef,paste0(unique(state),".csv"))
}
但是 - 我为每个文件获得了相同的数据(相同的截距、使用的变量和系数),并且它创建了两个非预期的列,'stateAr'、'stateAZ'、'stateCT'。
+---+--------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+------------+
| | X.Intercept. | X2 | X3 | X4 | X5 | X7 | stateAR | stateAZ | stateCT | state_used |
+---+--------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+------------+
| 1 | 1.027070119 | 0.593400469 | 0.852107976 | 0.219067212 | 0.447761824 | 0.213681166 | -3.421259006 | -2.250303456 | -0.558997077 | AL |
+---+--------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+------------+
我正在尝试接收这样的东西,只有循环通过的状态和基于最佳拟合的适当列:
+---+--------------+-------------+-------------+-------------+-------------+-------------+------------+
| | X.Intercept. | X2 | X3 | X4 | X5 | X7 | state_used |
+---+--------------+-------------+-------------+-------------+-------------+-------------+------------+
| 1 | 1.027070119 | 0.593400469 | 0.852107976 | 0.219067212 | 0.447761824 | 0.213681166 | AL |
+---+--------------+-------------+-------------+-------------+-------------+-------------+------------+
感谢您的帮助。
【问题讨论】:
-
@ManuelBickel Gah,你是对的,它是子集和 ==。它现在可以正确循环 - 如果您想添加您的评论作为答案,我会接受 - 感谢您的时间和帮助!
-
添加了扩展答案,因此将删除我的评论...希望对您有所帮助,祝您的项目好运。