fmin_cg：由于精度损失，不一定能达到预期误差答案

【问题标题】：fmin_cg: Desired error not necessarily achieved due to precision lossfmin_cg：由于精度损失，不一定能达到预期误差
【发布时间】：2016-02-24 13:10:46
【问题描述】：

我有以下代码来最小化具有梯度的成本函数。

def trainLinearReg( X, y, lamda ):
    # theta = zeros( shape(X)[1], 1 )
    theta = random.rand( shape(X)[1], 1 ) # random initialization of theta

    result = scipy.optimize.fmin_cg( computeCost, fprime = computeGradient, x0 = theta, 
                                     args = (X, y, lamda), maxiter = 200, disp = True, full_output = True )
    return result[1], result[0]

但是我有这个警告：

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 8403387632289934651424768.000000
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3

我的computeCost 和computeGradient 被定义为

def computeCost( theta, X, y, lamda ):
    theta = theta.reshape( shape(X)[1], 1 )
    m     = shape(y)[0]
    J     = 0
    grad  = zeros( shape(theta) )

    h = X.dot(theta)
    squaredErrors = (h - y).T.dot(h - y)
    # theta[0] = 0.0
    J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))

    return J[0]

def computeGradient( theta, X, y, lamda ):
    theta = theta.reshape( shape(X)[1], 1 )
    m     = shape(y)[0]
    J     = 0
    grad  = zeros( shape(theta) )

    h = X.dot(theta)
    squaredErrors = (h - y).T.dot(h - y)
    # theta[0] = 0.0
    J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
    grad = (1.0 / m) * (X.T.dot(h - y)) + (lamda / m) * theta

    return grad.flatten()

我已经查看了这些类似的问题：

scipy.optimize.fmin_bfgs: “Desired error not necessarily achieved due to precision loss”

scipy.optimize.fmin_cg: "'Desired error not necessarily achieved due to precision loss.'

scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"

但仍然无法解决我的问题。如何让最小化函数过程收敛而不是一开始就卡住了？

回答：

我根据下面@lejlot 的 cmets 解决了这个问题。他是对的。数据集X 太大，因为我没有正确地将正确的归一化值返回给正确的变量。尽管这是一个小错误，但确实可以让您思考遇到此类问题时应该从哪里看。成本函数值太大导致我的数据集可能有问题。

上一个错误的：

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

正确的：

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

其中X_poly在下面的训练中实际使用为

cost, theta = trainLinearReg(X_poly, y, lamda)

【问题讨论】：

lamda=0 会发生这种情况吗？
@lejlot 不，我已经尝试过 lamda=0.0 和 lamda=1.0 。实际上，作业需要 lamda=0
您可能还应该附上您的数据，因为您似乎获得了极大的J 值，所以您的数据可能没有正确预处理？你在 X 或 y 上有很大的价值吗？
看起来它还不够标准化，在您的问题中添加 X 和 y 的最大值、最小值、平均值（或者可能是直方图？）
您应该将问题的“ANSWER”部分作为实际答案发布，然后接受它，这样问题就不会保持开放状态。

标签： python python-2.7 numpy machine-learning scipy

【解决方案1】：

我今天遇到了这个问题。

然后我注意到我的成本函数以错误的方式实现，并且由于 scipy 要求更多数据而产生了高比例错误。希望这对像我这样的人有所帮助。

【讨论】：

【解决方案2】：

我也遇到过这个问题，即使在搜索了很多解决方案之后也没有任何反应，因为解决方案没有明确定义。

然后我阅读了 scipy.optimize.fmin_cg 的文档，其中明确提到参数 x0 必须是一维数组。

我的方法与您的方法相同，其中我将二维矩阵作为 x0 传递，但我总是遇到一些精度错误或除以零错误和与您一样的警告。

然后我改变了方法，将 theta 作为一维数组传递，并将该数组转换为对我有用的 computeCost 和 computeGradient 函数内的二维矩阵，我得到了预期的结果。

我的逻辑回归解决方案

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

theta = np.zeros(features)

def computeCost(theta,X, Y):
    x = np.matrix(X.values)
    y = np.matrix(Y.values)
    theta = np.matrix(theta)
    xtheta = np.matmul(x,theta.T)
    hx = sigmoid(xtheta)
    cost = (np.multiply(y,np.log(hx)))+(np.multiply((1-y),np.log(1-hx)))
    return -(np.sum(cost))/m

    def computeGradient(theta, X, Y):
    x = np.matrix(X.values)
    y = np.matrix(Y.values)
    theta = np.matrix(theta)
    grad = np.zeros(features)
    xtheta = np.matmul(x,theta.T)
    hx = sigmoid(xtheta)
    error = hx-Y
    for i in range(0,features,1):
        term = np.multiply(error,x[:,i])
        grad[i] = (np.sum(term))/m
    return grad

import scipy.optimize as opt  
result = opt.fmin_tnc(func=computeCost, x0=theta, fprime=computeGradient, args=(X, Y)) 

print cost(result[0],X, Y)

再次注意 theta 必须是一维数组

所以在你的代码中将trainLinearReg中的theta修改为theta = random.randn(features)

【讨论】：

【解决方案3】：

对于我的实现 scipy.optimize.fmin_cg 在一些初步猜测中也因上述错误而失败。然后我改成BFGS方法，收敛了。

 scipy.optimize.minimize(fun, x0, args=(), method='BFGS', jac=None, tol=None, callback=None, options={'disp': False, 'gtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': None, 'norm': inf})

似乎cg中的这个错误仍然是不可避免的， CG ends up with a non-descent direction

【讨论】：

【解决方案4】：

回答：

我根据下面@lejlot 的 cmets 解决了这个问题。他是对的。数据集X 太大，因为我没有正确地将正确的归一化值返回给正确的变量。尽管这是一个小错误，但确实可以让您想到遇到此类问题时应该从哪里看。成本函数值太大导致我的数据集可能有一些错误。

上一个错误的：

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

正确的：

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

其中X_poly在下面的训练中实际使用为

cost, theta = trainLinearReg(X_poly, y, lamda)

【讨论】：