【发布时间】:2019-12-16 22:57:46
【问题描述】:
我正在对具有两个预测变量(性别、政治倾向:二元、连续)的二元 DV 进行逻辑回归。我需要帮助让我的 GLM 在交叉验证中运行!尽管多次重新分类变量,我的代码仍无法工作。我不确定发生了什么。
这是我的代码:
`
#######################################################
# Cross-Validation of the Logistic Regression
#######################################################
gen <- as.numeric(choicelife.data$gender)
lnc <- as.numeric(choicelife.data$lc)
procprol <-as.numeric(choicelife.data$views)
# This code could be useful
nCV <- 50
MSE_1 <- numeric(nCV)
MSE_2 <- numeric(nCV)
folds <- cut(sample(n),breaks=nCV,labels=FALSE)
#Perform n.folds fold cross validation
i <- 1
for(i in 1:nCV){
#Segement your data by fold using the which() function
testIndexes <- which(folds==i,arr.ind=TRUE)
testData <- choicelife.data[testIndexes, ]
trainData <- choicelife.data[-testIndexes, ]
# Models
mod1<- glm(views ~ gen,
family=binomial(link=logit), data=trainData)
mod2<- glm(views ~ gen + lnc,
family=binomial(link=logit), data=trainData)
# Get predictions
pred_1 <- predict(mod1, newdata = testData)
pred_2 <- predict(mod2, newdata = testData)
# Calculate MSE
MSE_1[i] <- mean((testData$views - pred_1)^2)
MSE_2[i] <- mean((testData$views - pred_2)^2)
}
warnings()
# mean MSEs
mean(MSE_1)
mean(MSE_2)
# get differences
diffs <- MSE_1 - MSE_2
# get 95% CIs
meandiff <- mean(diffs)
sddiff <- sd(diffs)
c(meandiff-2*sddiff, meandiff+2*sddiff) # 95% Confidence interval (n, n)
【问题讨论】:
标签: r logistic-regression cross-validation glm