【问题标题】:Error with matrix not matching size of training set in RR中的矩阵与训练集大小不匹配的错误
【发布时间】:2021-04-10 02:43:36
【问题描述】:

我正在为一个项目制作岭回归模型。我的trainLed 数据集有 21 个变量的 2055 个 obs,Lifeexpectancy 是我正在研究的那个。当我使用下面的代码时,我的train.mat 只有 1917,并且在尝试运行岭回归代码时收到错误消息。我该怎么做才能使观察次数匹配?

代码:

train.mat = model.matrix(Lifeexpectancy~.,data=trainLed)


test.mat = model.matrix(Lifeexpectancy~.,data=testLed)

grid = 10^seq(4,-2,length = 120)

fit.ridge = glmnet(train.mat,trainLed$Lifeexpectancy,alpha=0,lambda=grid,thresh=1e-12) 

运行第 4 行时出错
glmnet 中的错误(train.mat, trainLed$Lifeexpectancy, alpha = 1, lambda = grid, : y (2055) 中的观察数不等于 x (1917) 的行数

【问题讨论】:

    标签: r regression


    【解决方案1】:

    您收到该错误是因为您的一些自变量为 NA,model.matrix() 将跳过这些行。

    例如,我们制作了一个类似的数据集,其中包含来自不同列的 100 个 NA,我得到了同样的错误:

    trainLed = data.frame(matrix(rnorm(2055*11),ncol=11))
    colnames(trainLed) = c(paste0("var",1:10),"Lifeexpectancy")
    trainLed$var1[1:50] = NA
    trainLed$var2[51:100] = NA
    
    train.mat = model.matrix(Lifeexpectancy~.,data=trainLed)
    fit.ridge = glmnet(train.mat,trainLed$Lifeexpectancy,alpha=0)
    
    Error in glmnet(train.mat, trainLed$Lifeexpectancy, alpha = 0) : 
      number of observations in y (2055) not equal to the number of rows of x (1955)
    

    对您的数据进行子集化以获得完整的观察结果:

    trainLed = trainLed[complete.cases(trainLed),]
    train.mat = model.matrix(Lifeexpectancy~.,data=trainLed)
    fit.ridge = glmnet(train.mat,trainLed$Lifeexpectancy,alpha=0)
    

    【讨论】:

      猜你喜欢
      • 2015-06-27
      • 2013-11-30
      • 1970-01-01
      • 2015-08-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多