R 中的幂回归系数与 Excel 不匹配答案

【问题标题】：Coefficients from power regression in R do not match ExcelR 中的幂回归系数与 Excel 不匹配
【发布时间】：2019-08-19 15:21:10
【问题描述】：

我在 R 和 Excel 中创建了长度与干质量的幂回归方程，但系数不匹配。

我通过这个链接使用了 Hong Ooi 的answer：Power regression in R similar to excel。在该代码中，他们能够使用 R 代码从 Excel 复制幂方程。但是，当我尝试时，我得到了一些非常奇怪的系数。使用随机长度进行测试时，幂趋势线的 Excel 方程要准确得多。

代码如下：

#sample dataset of Lengths and Dry Masses
test <- structure(list(
  Length = c(23, 17, 16, 25, 15, 25, 11, 22, 13, 21, 31), 
  DryMass = c(3.009, 1.6, 1, 4.177, 0.992, 6.166, 0.7, 1.73, 0.613, 3.429, 7.896)), 
  .Names = c("Length", "DryMass"), 
  row.names = c(NA, 11L), 
  class = "data.frame")

#log-log regression
lm(formula = log(Length) ~ log(DryMass), data = test)

Coefficients:
 (Intercept)  log(DryMass)  
      2.7048        0.3413

一旦我转换截距 (EXP(2.7048) = 14.9515)，这应该给我等式“14.9515*x^0.3413”。我试图用一些随机长度对其进行测试，但预测结果很差。

However, the equation given by Excel is "0.0009*x^2.6291" which, when tested, was very accurate. 我只会使用 Excel 中的公式，但我需要再做 50 个这样的公式，并且希望使用 R 来自动化它。

【问题讨论】：

标签： r excel regression

【解决方案1】：

编辑：

你在 R 中切换了 x 和 y。

mod_linearized <- lm(formula = log(DryMass) ~ log(Length), data = test)

exp(coef(mod_linearized)[1])
# (Intercept) 
#0.0008775079

旧答案（可能仍然有用）：

线性化模型的反变换与非线性模型不同，因为误差项不同：

反变换线性化模型导致乘法误差： y = exp(a) * x ^ b * exp(epsilon)

非线性模型有一个附加误差： y = a * x ^ b + epsilon

基本上，线性化等效于数据点的不同权重（较大的值权重较小）。这实际上是可取的（取决于您的特定数据生成过程）。但有时你想要相等的权重，然后你应该拟合非线性模型。

您可以在 R 中进行非线性回归：

mod_linearized <- lm(formula = log(Length) ~ log(DryMass), data = test)

exp(coef(mod_linearized)[1])
#(Intercept) 
#   14.95152 


mod_nonlinear <- nls(Length ~ a * DryMass ^ b, data = test, 
                     #use result from linearization as starting values:
                     start = list(a = exp(coef(mod_linearized)[1]), 
                                  b = coef(mod_linearized)[2]))

coef(mod_nonlinear)[1]
#      a 
#15.2588

【讨论】：

...这是正确答案。起初，我不知道我必须取截距的指数，所以我忽略了负值并切换了 x 和 y。正在阅读其他一些内容并了解如何实际使用系数，但忘记了我切换了 x 和 y。谢谢！

【解决方案2】：

您正在尝试拟合以下模型。

library(ggplot2)
ggplot(test, aes(x = log(DryMass), y = log(Length))) +
  theme_bw() +
  geom_point() +
  scale_y_continuous(limits = c(0, 5)) +
  geom_smooth(formula = y ~ x, method = "lm", se = FALSE)

（截距）（第一个系数）是线在 x=0 处与 y 相交的位置，我相信。在上图中，这似乎在 2.5 和 3 之间，所以假设为 2.8，如果你问我，这非常接近 2.7。可能是 Excel 出错了，在这种情况下，我建议你联系他们的作者？或者，也许您正在 Excel 中做一些在这里没有听到的事情，因此可以说该工具的可重复性。

【讨论】：