什么时候应该使用 aov()，什么时候应该使用 anova()？答案

【问题标题】：When should I use aov() and when anova()?什么时候应该使用 aov()，什么时候应该使用 anova()？
【发布时间】：2017-04-10 22:28:09
【问题描述】：

我参考了很多在线文学作品，但这增加了我的困惑。大部分讨论都过于技术性，涉及术语不平衡设计和 I、II 或 III 因子 ANOVA 等等。

我只知道aov()在内部使用lm()，对带因子的数据有用。而anova() 可用于同一数据集上的不同模型。我的理解正确吗？

【问题讨论】：

您指的是R 函数，这是特定于R 的。您最好阅读这些函数的文档并提出一个特定于基础统计概念的问题。
我确实首先阅读了文档。恕我直言，这是我能找到的最神秘的。这是我最后的途径。
要了解必要的统计概念（unbalanced、type I SS 等），在这里阅读我的答案可能会对您有所帮助：@987654321 @
在您阅读了@gung 链接到的建议文章之后，为什么不回到这里提出一个关于您不了解哪些统计位的新问题？其中一些东西非常棘手。

标签： r anova

【解决方案1】：

anova 与aov 有很大不同。为什么不阅读 R 的文档 ?aov 和 ?anova？简而言之：

aov 适合模型（如您所知，它在内部调用 lm），因此它会产生回归系数、拟合值、残差等；它产生一个主要类“aov”的对象，但也产生一个次要类“lm”。因此，它是“lm”对象的扩充。
anova 是一个通用函数。在您的场景中，您指的是anova.lm 或anova.lmlist（阅读?anova.lm 了解更多信息）。前者分析一个拟合模型（由lm 或aov 生成），而后者分析几个嵌套（越来越大）拟合模型（由lm 或aov）。它们都旨在生成 I 型（顺序）ANOVA 表。

在实践中，您首先使用lm / aov 拟合模型，然后使用anova 分析结果。没有什么比尝试一个小例子更好的了：

fit <- aov(sr ~ ., data = LifeCycleSavings)  ## can also use `lm`
z <- anova(fit)

现在，看看它们的结构。 aov 返回一个大对象：

str(fit)

#List of 12
# $ coefficients : Named num [1:5] 28.566087 -0.461193 -1.691498 -0.000337 0.409695
#  ..- attr(*, "names")= chr [1:5] "(Intercept)" "pop15" "pop75" "dpi" ...
# $ residuals    : Named num [1:50] 0.864 0.616 2.219 -0.698 3.553 ...
#  ..- attr(*, "names")= chr [1:50] "Australia" "Austria" "Belgium" "Bolivia" ...
# $ effects      : Named num [1:50] -68.38 -14.29 7.3 -3.52 -7.94 ...
#  ..- attr(*, "names")= chr [1:50] "(Intercept)" "pop15" "pop75" "dpi" ...
# $ rank         : int 5
# $ fitted.values: Named num [1:50] 10.57 11.45 10.95 6.45 9.33 ...
#  ..- attr(*, "names")= chr [1:50] "Australia" "Austria" "Belgium" "Bolivia" ...
# $ assign       : int [1:5] 0 1 2 3 4
# $ qr           :List of 5
#  ..$ qr   : num [1:50, 1:5] -7.071 0.141 0.141 0.141 0.141 ...
#  .. ..- attr(*, "dimnames")=List of 2
#  .. .. ..$ : chr [1:50] "Australia" "Austria" "Belgium" "Bolivia" ...
#  .. .. ..$ : chr [1:5] "(Intercept)" "pop15" "pop75" "dpi" ...
#  .. ..- attr(*, "assign")= int [1:5] 0 1 2 3 4
#  ..$ qraux: num [1:5] 1.14 1.17 1.16 1.15 1.05
#  ..$ pivot: int [1:5] 1 2 3 4 5
#  ..$ tol  : num 1e-07
#  ..$ rank : int 5
#  ..- attr(*, "class")= chr "qr"
# $ df.residual  : int 45
# $ xlevels      : Named list()
# $ call         : language aov(formula = sr ~ ., data = LifeCycleSavings)
# $ terms        :Classes 'terms', 'formula'  language sr ~ pop15 + pop75 + dpi + ddpi
#  .. ..- attr(*, "variables")= language list(sr, pop15, pop75, dpi, ddpi)
#  .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
#  .. .. ..- attr(*, "dimnames")=List of 2
#  .. .. .. ..$ : chr [1:5] "sr" "pop15" "pop75" "dpi" ...
#  .. .. .. ..$ : chr [1:4] "pop15" "pop75" "dpi" "ddpi"
#  .. ..- attr(*, "term.labels")= chr [1:4] "pop15" "pop75" "dpi" "ddpi"
#  .. ..- attr(*, "order")= int [1:4] 1 1 1 1
#  .. ..- attr(*, "intercept")= int 1
#  .. ..- attr(*, "response")= int 1
#  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#  .. ..- attr(*, "predvars")= language list(sr, pop15, pop75, dpi, ddpi)
#  .. ..- attr(*, "dataClasses")= Named chr [1:5] "numeric" "numeric" "numeric" "numeric" ...
#  .. .. ..- attr(*, "names")= chr [1:5] "sr" "pop15" "pop75" "dpi" ...
# $ model        :'data.frame':	50 obs. of  5 variables:
#  ..$ sr   : num [1:50] 11.43 12.07 13.17 5.75 12.88 ...
#  ..$ pop15: num [1:50] 29.4 23.3 23.8 41.9 42.2 ...
#  ..$ pop75: num [1:50] 2.87 4.41 4.43 1.67 0.83 2.85 1.34 0.67 1.06 1.14 ...
#  ..$ dpi  : num [1:50] 2330 1508 2108 189 728 ...
#  ..$ ddpi : num [1:50] 2.87 3.93 3.82 0.22 4.56 2.43 2.67 6.51 3.08 2.8 ...
#  ..- attr(*, "terms")=Classes 'terms', 'formula'  language sr ~ pop15 + pop75 + dpi + ddpi
#  .. .. ..- attr(*, "variables")= language list(sr, pop15, pop75, dpi, ddpi)
#  .. .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
#  .. .. .. ..- attr(*, "dimnames")=List of 2
#  .. .. .. .. ..$ : chr [1:5] "sr" "pop15" "pop75" "dpi" ...
#  .. .. .. .. ..$ : chr [1:4] "pop15" "pop75" "dpi" "ddpi"
#  .. .. ..- attr(*, "term.labels")= chr [1:4] "pop15" "pop75" "dpi" "ddpi"
#  .. .. ..- attr(*, "order")= int [1:4] 1 1 1 1
#  .. .. ..- attr(*, "intercept")= int 1
#  .. .. ..- attr(*, "response")= int 1
#  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#  .. .. ..- attr(*, "predvars")= language list(sr, pop15, pop75, dpi, ddpi)
#  .. .. ..- attr(*, "dataClasses")= Named chr [1:5] "numeric" "numeric" "numeric" "numeric" ...
#  .. .. .. ..- attr(*, "names")= chr [1:5] "sr" "pop15" "pop75" "dpi" ...
# - attr(*, "class")= chr [1:2] "aov" "lm"

anova 返回时：

str(z)

#Classes ‘anova’ and 'data.frame':  5 obs. of  5 variables:
# $ Df     : int  1 1 1 1 45
# $ Sum Sq : num  204.1 53.3 12.4 63.1 650.7
# $ Mean Sq: num  204.1 53.3 12.4 63.1 14.5
# $ F value: num  14.116 3.689 0.858 4.36 NA
# $ Pr(>F) : num  0.000492 0.061125 0.359355 0.042471 NA
# - attr(*, "heading")= chr  "Analysis of Variance Table\n" "Response: sr"

【讨论】：