用分类变量解释汇总统计[重复]答案

【问题标题】：Interpreting Summary Statistics with Categorical Variables [duplicate]用分类变量解释汇总统计[重复]
【发布时间】：2015-11-13 09:08:10
【问题描述】：

有了这个输出，我知道截距是两个因子都为 0 时。我知道因子（V1）1 表示 V1=1，因子（V2）1 表示 V2=1。要获得仅 V1 = 1 的斜率，我将添加 5.1122 +(-0.4044)。但是，我想知道如何解释此输出中的 p 值。如果只是 V1 = 1，这是否意味着 p 值为 2.39e-12 + 0.376？如果是这样，我运行的每个模型只有在所有因素 = 0 时才有意义......

> lm.comfortgender=lm(V13~factor(V1)+factor(V2),data=comfort.txt)
> summary(lm.comfortgender)

Call:
lm(formula = V13 ~ factor(V1) + factor(V2), data = comfort.txt)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5676 -1.0411  0.1701  1.4324  2.0590 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.1122     0.5244   9.748 2.39e-12 ***
factor(V1)1  -0.4044     0.4516  -0.895    0.376    
factor(V2)1   0.2332     0.5105   0.457    0.650    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.487 on 42 degrees of freedom
Multiple R-squared:  0.02793,   Adjusted R-squared:  -0.01836 
F-statistic: 0.6033 on 2 and 42 DF,  p-value: 0.5517

【问题讨论】：

这更像是一个交叉验证的问题，您可能应该在这里查阅统计教科书。您引用的 p 值是截距的 p 值。它与“模型”的重要性无关 - 实际上没有衡量标准，除非你想考虑 R 平方（你的 R 平方非常低）。您的 F-stat 也很低，这意味着您的模型指定不充分（或者您的系数共同为零的概率很高）。

标签： r lm dummy-variable

【解决方案1】：

在R 回归模型中作为输出给出的 p 值检验零假设，即该特定系数的分布的平均值为零，假设分布是正态的且标准差是平方根的方差。如需进一步说明，请参阅this other 答案。

【讨论】：