自举 nls 期间的奇异梯度误差适合不良数据答案

【问题标题】：Singular gradient error during bootstrapped nls fit to bad data自举 nls 期间的奇异梯度误差适合不良数据
【发布时间】：2012-10-13 12:43:18
【问题描述】：

我有一个包含一个自变量和一组因变量的数据集。我想使用自举非线性最小二乘法为每组自变量拟合一个函数。在某些情况下，自变量是“质量好的”，即相当好地拟合函数。在其他情况下，它们很吵。

在所有情况下，我都可以使用nls() 来估计参数。但是，当数据嘈杂时，引导程序会引发错误Error in nls(...) : singular gradient。我可以理解为什么 nls 拟合嘈杂的数据会失败，例如由于在太多迭代后未能收敛，但我不明白为什么它是一个奇异的梯度错误，以及为什么我只得到质量差的重新采样数据集。

代码：

require(ggplot2)
require(plyr)
require(boot)

# Data are in long form: columns are 'enzyme', 'x', and 'y'
enz <- read.table("http://dl.dropbox.com/s/ts3ruh91kpr47sj/SE.txt", header=TRUE)

# Nonlinear formula to fit to data
mmFormula <- formula(y ~ (x*Vmax) / (x + Km))

nls 完全能够拟合数据（即使在某些情况下，例如 a，我怀疑模型是否适合数据。

# Use nls to fit mmFormula to the data - this works well enough
fitDf <- ddply(enz, .(enzyme), function(x) coefficients(nls(mmFormula, x, start=list(Km=100, Vmax=0.5))))

# Create points to plot for the simulated fits
xGrid <- 0:200
simFits <- dlply(fitDf, .(enzyme), function(x) data.frame(x=xGrid, y=(xGrid * x$Vmax)/(xGrid + x$Km)))
simFits <- ldply(simFits, identity) 

ggplot() + geom_point(data=enz, aes(x=x, y=y)) + geom_line(data=simFits, aes(x=x, y=y)) + 
  facet_wrap(~enzyme, scales="free_y") + aes(ymin=0)

Bootstrapping 适用于高质量数据：

# Function to pass to bootstrap; returns coefficients of nls fit to formula
nlsCoef <- function(df, i) {
  KmGuess <- median(df$x)
  VmaxGuess <- max(df$y)
  dfSamp <- df[i,]
  nlsCoef <- coefficients(nls(mmFormula, dfSamp, start=list(Km=100, Vmax=0.5)))
}

eBoot <- boot(subset(enz, enzyme=="e"), nlsCoef, R=1000) #No error

但不是因为质量差的数据

dBoot <- boot(subset(enz, enzyme=="d"), nlsCoef, R=10)
> Error in nls(mmFormula, dfSamp, start = list(Km = KmGuess, Vmax = VmaxGuess)) : 
   singular gradient

是什么导致了这个错误？鉴于我想使用plyr 同时执行大量引导模拟，我应该怎么做？

【问题讨论】：

我会避免只用三个不同的浓度值拟合 Michaelis-Menten。但是，也许您可以通过首先使用 lm 拟合 Lineweaver-Burk 来改进对起始值的猜测（特别是 KmGuess）。
是的，我意识到实验方案并不理想。活到老，学到老。使用 Lineweaver-Burke 进行初步猜测是个好主意。但是，我不认为起始猜测是问题，因为 a.) nls 适合（没有引导）在相对较差的起始猜测下工作正常，例如Km=100，Vmax=0.5； b.) 当我将引导函数更改为相同的起始猜测时，我得到相同的错误，并且 c.) 我认为错误的起始猜测通常会导致收敛失败错误而不是奇异梯度错误。
嗯，你有一些数据根本不符合模型。有时我已经能够通过使用不同的起始值来解决类似的问题（甚至是奇异的梯度错误）（nls2 可以帮助解决这个问题）。不同的优化算法也可能有所帮助。但如果数据严重违反模型，则无法拟合，这可能在引导过程中发生。
但这就是我没有得到的东西 - 所有数据可以适合模型。只有重新采样的数据无法被模型拟合。
也许您可以引导残差以更好地保留x 分布？

标签： r nonlinear-functions statistics-bootstrap

【解决方案1】：

这使您可以检查发生了什么：

#modified function
#returns NAs if fit is not sucessfull
#does global assignment to store bootstrap permutations
nlsCoef <- function(df, i) {
  KmGuess <- median(df$x)
  VmaxGuess <- max(df$y)
  dfSamp <- df[i,]
  fit <- NULL
  try(fit <- nls(mmFormula, dfSamp, start=list(Km=100, Vmax=0.5)))
  if(!is.null(fit)){
    res <- coef(fit)
  } else{
    res <- c(Km=NA,Vmax=NA)
  }

  istore[k,] <<- i
  k <<- k+1
  res
}

n <- 100
istore <- matrix(nrow=n+1,ncol=9)
k <- 1

dat <- subset(enz, enzyme=="d")
dBoot <- boot(dat, nlsCoef, R=n) 

#permutations that create samples that cannot be fitted
nais <- istore[-1,][is.na(dBoot$t[,1]),]

#look at first bootstrap sample 
#that could not be fitted
samp <- dat[nais[1,],]
plot(y~x,data=samp)
fit <- nls(mmFormula, samp, start=list(Km=100, Vmax=0.5))
#error

您也可以使用自启动模型：

try(fit <- nls(y ~ SSmicmen(x, Vmax, Km), data = dfSamp))

这样，错误消息就会变得更加丰富。例如，一个错误是

too few distinct input values to fit a Michaelis-Menten model

这意味着，一些 bootstrap 样本包含少于三个不同的浓度。但也有一些其他的错误：

step factor 0.000488281 reduced below 'minFactor' of 0.000976562

您可以通过减少 minFactor 来避免这种情况。

以下内容很讨厌。您可以尝试不同的拟合算法或起始值：

singular gradient matrix at initial parameter estimates

singular gradient

【讨论】：

PS：避免在函数内部使用subset。它的帮助页面特别警告了这一点。请改用[。