每次梯度下降迭代后线性回归损失值增加答案

【问题标题】：Linear Regression loss value increasing after each iteration of gradient descent每次梯度下降迭代后线性回归损失值增加
【发布时间】：2021-01-18 01:43:23
【问题描述】：

我正在尝试实现多元线性回归（梯度下降和 mse 成本函数），但每次梯度下降迭代的损失值都呈指数增长，我无法弄清楚为什么？

from sklearn.datasets import load_boston


class LinearRegression:

    def __init__(self):
        self.X = None  # The feature vectors [shape = (m, n)]
        self.y = None  # The regression outputs [shape = (m, 1)]
        self.W = None  # The parameter vector `W` [shape = (n, 1)]
        self.bias = None  # The bias value `b`
        self.lr = None  # Learning Rate `alpha`
        self.m = None
        self.n = None
        self.epochs = None

    def fit(self, X: np.ndarray, y: np.ndarray, epochs: int = 100, lr: float = 0.001):
        self.X = X  # shape (m, n)
        self.m, self.n = X.shape
        assert y.size == self.m and y.shape[0] == self.m
        self.y = np.reshape(y, (-1, 1))  # shape (m, ) or (m, 1)
        assert self.y.shape == (self.m, 1)
        self.W = np.random.random((self.n, 1)) * 1e-3  # shape (n, 1)
        self.bias = 0.0
        self.epochs = epochs
        self.lr = lr
        self.minimize()

    def minimize(self, verbose: bool = True):
        for num_epoch in range(self.epochs):
            predictions = np.dot(self.X, self.W)

            assert predictions.shape == (self.m, 1)
            grad_w = (1/self.m) * np.sum((predictions-self.y) * self.X, axis=0)[:, np.newaxis]
            self.W = self.W - self.lr * grad_w
            assert self.W.shape == grad_w.shape
            loss = (1 / 2 * self.m) * np.sum(np.square(predictions - self.y))

            if verbose:
                print(f'Epoch : {num_epoch+1}/{self.epochs} \t Loss : {loss.item()}')


linear_regression = LinearRegression()
x_train, y_train = load_boston(return_X_y=True)
linear_regression.fit(x_train, y_train, 10)

我正在使用来自 sklearn 的波士顿住房数据集。

附言。我想知道是什么导致了这个问题以及如何解决它以及我的实现是否正确。

谢谢

【问题讨论】：

这叫做发散。这意味着模型没有在学习，因为权重值呈爆炸式增长。
你为什么不学习偏见？
这是一个粗略的实现，所以我决定暂时搁置偏见，只看权重。
@filtertips...我已经理解了实现的问题...您能提出一个可能解决问题的修复方法吗？[诸如数据的正则化或规范化之类的东西怎么样?他们会有帮助吗？]

标签： python numpy machine-learning scikit-learn linear-regression

【解决方案1】：

错误在于梯度。对于迭代收缩阈值算法 (ISTA) 求解器，您不应该看到这样的分歧。对于您的梯度计算：X 的形状为 (m,n)，W 的形状为 (n,1)，所以 (prediction - y) 的形状为 (m,1)，然后乘以左边的 X？ (m,1) 乘 (m,n)？不确定 numpy 正在计算什么，但它不是您想要计算的：

grad_w = (1/self.m) * np.sum((predictions-self.y) * self.X, axis=0)[:, np.newaxis]

这里的代码应该有点不同，将 (n,m) 乘以 (m,1) 以获得与 W 相同的形状 (n,1)。

(1/self.m) * np.sum(self.X.T*(predictions-self.y) , axis=0)[:, np.newaxis]

为了推导正确。

我也不确定您为什么使用点（这是一个好主意）进行预测而不是渐变。

你也不需要那么多重塑：

from sklearn.datasets import load_boston

A,b = load_boston(return_X_y=True)
n_samples = A.shape[0]
n_features = A.shape[1]

def grad_linreg(x):
    """Least-squares gradient"""
    grad = (1. / n_samples) * np.dot(A.T, np.dot(A, x) - b)
    return grad

def loss_linreg(x):
    """Least-squares loss"""
    f = (1. / (2. * n_samples)) * sum((b - np.dot(A, x)) ** 2)
    return f

然后您检查您的渐变是否良好：

from scipy.optimize import check_grad
from numpy.random import randn

check_grad(loss_linreg,grad_linreg,randn(n_features))
check_grad(loss_linreg,grad_linreg,randn(n_features))
check_grad(loss_linreg,grad_linreg,randn(n_features))
check_grad(loss_linreg,grad_linreg,randn(n_features))

然后您可以在此基础上构建模型。如果您想使用 ISTA/FISTA 和 Logistic/Linear Regression 和 LASSO/RIDGE 进行测试，这里是 jupyter notebook with the theory and a working example

【讨论】：