【问题标题】:Using specified regressors in glm() in R在 R 的 glm() 中使用指定的回归器
【发布时间】:2016-04-30 23:30:56
【问题描述】:

关于R,glm()函数的问题:

我有一个数据集:

mydata <- read.csv("data.csv", header = TRUE) 

其中包含变量“y”(y 是二进制 0 或 1)和 60 个回归量。其中三个回归量是“平均”、“年龄”和“收入”(三个都是数字)。

我想使用 glm 函数进行逻辑回归,如下:

model <-glm(y~., data = mydata, family = binomial)

如果我不想在 glm() 函数中使用三个指定的变量(avg、age 和income),并且只使用剩余的 57 个变量,你能告诉我如何进行吗?

【问题讨论】:

    标签: r statistics glm


    【解决方案1】:

    您可以在运行 glm() 之前从 mydata 中简单地排除这三个变量。

    在这里我创建了一些示例数据:

    set.seed(1)
    mydata<-replicate(10,rnorm(100,300,50))
    mydata<-data.frame(dv=sample(c(0,1),100,replace = TRUE),mydata)
    
    > head(mydata)
      dv       X1       X2       X3       X4       X5       X6       X7       X8       X9      X10
    1  1 268.6773 268.9817 320.4701 344.6837 353.7220 303.8652 282.9467 264.6216 245.6546 222.9299
    2  1 309.1822 302.1058 384.4437 247.6351 394.7827 285.1566 375.1212 398.5786 208.6958 309.7161
    3  1 258.2186 254.4539 379.3294 398.5669 269.8501 240.8379 326.4154 295.5001 349.7641 313.2211
    4  0 379.7640 307.9014 283.4546 280.8184 280.4566 300.5646 327.1096 299.2991 299.4069 244.0632
    5  0 316.4754 267.2708 185.7382 382.7073 279.1889 349.5801 293.1663 243.8272 270.0186 332.5476
    6  0 258.9766 388.3644 424.8831 375.6106 281.2171 379.6984 243.1633 232.7935 291.1026 248.3550
    

    如果我在数据上运行您指定的模型,那么我会使用右侧的所有变量:

    model<-glm(data=mydata, dv~.,family=binomial(link = 'logit'))
    
    > summary(model)
    
    Call:
    glm(formula = dv ~ ., family = binomial(link = "logit"), data = mydata)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -1.8891  -1.0853  -0.5163   1.0237   1.8303  
    
    Coefficients:
                  Estimate Std. Error z value Pr(>|z|)  
    (Intercept) -2.4330825  4.1437180  -0.587   0.5571  
    X1          -0.0020482  0.0049025  -0.418   0.6761  
    X2          -0.0059021  0.0046298  -1.275   0.2024  
    X3           0.0123246  0.0047991   2.568   0.0102 *
    X4           0.0024804  0.0046856   0.529   0.5966  
    X5           0.0025348  0.0039545   0.641   0.5215  
    X6          -0.0005905  0.0047417  -0.125   0.9009  
    X7          -0.0001758  0.0040737  -0.043   0.9656  
    X8           0.0042362  0.0041170   1.029   0.3035  
    X9          -0.0007664  0.0042471  -0.180   0.8568  
    X10         -0.0042089  0.0043094  -0.977   0.3287  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 138.59  on 99  degrees of freedom
    Residual deviance: 125.11  on 89  degrees of freedom
    AIC: 147.11
    
    Number of Fisher Scoring iterations: 4
    

    现在我从 mydata 中排除 X1 和 X2 并再次运行模型:

    mydata2<-mydata[,-match(c('X1','X2'), colnames(mydata))]
    
    model2<-glm(data=mydata2, dv~.,family=binomial(link = 'logit'))
    > summary(model2)
    
    Call:
    glm(formula = dv ~ ., family = binomial(link = "logit"), data = mydata2)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -1.8983  -1.0724  -0.4521   1.1132   1.7792  
    
    Coefficients:
                  Estimate Std. Error z value Pr(>|z|)   
    (Intercept) -4.8725545  3.6357314  -1.340  0.18019   
    X3           0.0124982  0.0047930   2.608  0.00912 **
    X4           0.0031911  0.0045971   0.694  0.48758   
    X5           0.0015992  0.0038101   0.420  0.67467   
    X6          -0.0003295  0.0046554  -0.071  0.94357   
    X7           0.0003372  0.0039961   0.084  0.93275   
    X8           0.0038889  0.0040737   0.955  0.33977   
    X9          -0.0010014  0.0042078  -0.238  0.81189   
    X10         -0.0041691  0.0042232  -0.987  0.32356   
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 138.59  on 99  degrees of freedom
    Residual deviance: 126.93  on 91  degrees of freedom
    AIC: 144.93
    
    Number of Fisher Scoring iterations: 4
    

    【讨论】:

      【解决方案2】:

      公式右边的.("everything")可以通过减去项来修改:

      model <- glm(y~ . - avg - age - income, data = mydata, 
           family = binomial)
      

      【讨论】:

        猜你喜欢
        • 2014-06-20
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-06-08
        • 2018-10-21
        • 1970-01-01
        • 1970-01-01
        • 2014-12-16
        相关资源
        最近更新 更多