【问题标题】:Cross-validating a logistic regression in R交叉验证 R 中的逻辑回归
【发布时间】:2019-12-16 22:57:46
【问题描述】:

我正在对具有两个预测变量(性别、政治倾向:二元、连续)的二元 DV 进行逻辑回归。我需要帮助让我的 GLM 在交叉验证中运行!尽管多次重新分类变量,我的代码仍无法工作。我不确定发生了什么。

这是我的代码:

`

#######################################################
#     Cross-Validation of the Logistic Regression
#######################################################


gen <- as.numeric(choicelife.data$gender)
lnc <- as.numeric(choicelife.data$lc)
procprol <-as.numeric(choicelife.data$views)

# This code could be useful
nCV <- 50
MSE_1 <- numeric(nCV)
MSE_2 <- numeric(nCV)

folds <- cut(sample(n),breaks=nCV,labels=FALSE)

#Perform n.folds fold cross validation
i <- 1
for(i in 1:nCV){

  #Segement your data by fold using the which() function 
  testIndexes <- which(folds==i,arr.ind=TRUE)
  testData <- choicelife.data[testIndexes, ]
  trainData <- choicelife.data[-testIndexes, ]

  # Models
  mod1<- glm(views ~ gen,
             family=binomial(link=logit), data=trainData)

  mod2<- glm(views ~ gen + lnc,
             family=binomial(link=logit), data=trainData)

  # Get predictions
  pred_1 <- predict(mod1, newdata = testData)
  pred_2 <- predict(mod2, newdata = testData)

  # Calculate MSE
  MSE_1[i] <- mean((testData$views - pred_1)^2)
  MSE_2[i] <- mean((testData$views - pred_2)^2)
}
warnings()

# mean MSEs
mean(MSE_1) 
mean(MSE_2) 

# get differences
diffs <- MSE_1 - MSE_2

# get 95% CIs
meandiff <- mean(diffs) 
sddiff <- sd(diffs) 
c(meandiff-2*sddiff, meandiff+2*sddiff) # 95% Confidence interval (n, n)

【问题讨论】:

    标签: r logistic-regression cross-validation glm


    【解决方案1】:

    您将一些变量转换为数值,但没有将它们放在 data.frame 中。在您对 nCV 的迭代中,子集数据帧不包含数值变量,并且将不起作用。

    首先,我模拟一些看起来应该像你的数据框choicelife的东西:

    choicelife.data = data.frame(
    lc=sample(1:10,100,replace=TRUE),
    gender=sample(c("M","F"),100,replace=TRUE),
    views = sample(c("Pro","Against"),100,replace=TRUE)
    )
    

    请参阅下面的建议编辑:

    choicelife.data$gen <- as.numeric(choicelife.data$gender)
    choicelife.data$lnc <- as.numeric(choicelife.data$lc)
    # make this 0 or 1
    choicelife.data$procprol <-as.numeric(choicelife.data$views)-1
    
    # This code could be useful
    nCV <- 5
    MSE_1 <- numeric(nCV)
    MSE_2 <- numeric(nCV)
    
    folds <- cut(sample(1:nrow(choicelife.data)),breaks=nCV,labels=FALSE)
    
    for(i in 1:nCV){
    
      testIndexes <- which(folds==i,arr.ind=TRUE)
      testData <- choicelife.data[testIndexes, ]
      trainData <- choicelife.data[-testIndexes, ]
    
      # Models
      mod1<- glm(procprol ~ gen,
                 family=binomial(link=logit), data=trainData)
    
      mod2<- glm(procprol ~ gen + lnc,
                 family=binomial(link=logit), data=trainData)
    
      # Get predictions
      pred_1 <- predict(mod1, newdata = testData,type="response")
      pred_2 <- predict(mod2, newdata = testData,type="response")
    
      # Calculate MSE
      MSE_1[i] <- mean((testData$procprol - pred_1)^2)
      MSE_2[i] <- mean((testData$procprol - pred_2)^2)
    }
    

    【讨论】:

      猜你喜欢
      • 2017-01-25
      • 1970-01-01
      • 2016-11-20
      • 2017-01-02
      • 2017-07-07
      • 2019-04-20
      • 2020-09-27
      • 2016-09-10
      • 2021-05-06
      相关资源
      最近更新 更多