Theano 的线性回归 - 维度不匹配答案

【问题标题】：Linear Regression with Theano - Dimension Mis-matchTheano 的线性回归 - 维度不匹配
【发布时间】：2016-03-24 13:12:42
【问题描述】：

我正在熟悉 Theano 和机器学习。为此，我想计算一个线性回归。我的代码灵感来自 Theano 简介中的 logistic regression example。

我写了以下代码：

    import numpy
    import theano
    import theano.tensor as T

    class LinearRegression(object):
        """ Calculate Linear Regression """

        def __init__(self, input):
            """ Initialize the parameters of the logistic regression

            Parameters:
            -----------

            :type input: theano.tensor.TensorType
            :param input: symbolic variable that describes the input of the
                          architecture (one minibatch)
            """
            self.W = theano.shared(
                value=numpy.zeros(1, dtype=theano.config.floatX),
                name='W', borrow=True
            )

            self.b = theano.shared(
                value=numpy.zeros(1, dtype=theano.config.floatX),
                name='b', borrow=True
            )

            self.y_pred = T.dot(input, self.W) + self.b

        def errors(self, y):
            """ The squared distance

            Parameters:
            ----------

            :y input: array_like:
            :param input: the sample data

            """
            errors = y- self.y_pred
            return T.sum(T.pow(errors, 2))


    def sgd_optimization(learning_rate=0.0013, n_epochs=100):
        """
        Demonstrate stochastic gradient descent optimization of a linear model

        Parameters:
        -----
        :type learning_rate: float
        :param learning_rate: learning rate used (factor for the stochastic
                              gradient)

        :type n_epochs: int
        :param n_epochs: maximal number of epochs to run the optimizer
        """
        x_train = numpy.random.uniform(low=-2, high = 2, size=(50,1))
        epsilon =  numpy.random.normal(scale=0.01, size=50)
        y_train = numpy.squeeze(2*x_train) + epsilon

        costs = []
        eta0, x, y = T.scalar('eta0'), T.matrix(name='x'), T.vector(name='y')

        classifier = LinearRegression(input = x)
        cost = classifier.errors(y)
        g_W = T.grad(cost=cost, wrt=classifier.W)
        g_b = T.grad(cost=cost, wrt=classifier.b)
        update = [(classifier.W, classifier.W - eta0 * g_W),
                   (classifier.b, classifier.b - eta0 * g_b)]

        train = theano.function(inputs = [eta0],
                                outputs = cost,
                                updates = update,
                                givens = {x: x_train, y: y_train})

        for _ in range(n_epochs):
            costs.append(train(learning_rate))

        return costs, w

    SSE, regressor = sgd_optimization()

不幸的是，当我运行代码时，Python 返回以下错误消息：

ValueError: Input dimension mis-match. (input[0].shape[0] = 1, input[1].shape[0] = 50)
Apply node that caused the error: Elemwise{Composite{((-i0) + i1)}}[(0, 1)](b, CGemv{no_inplace}.0)
Inputs types: [TensorType(float64, vector), TensorType(float64, vector)]
Inputs shapes: [(1,), (50,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([ 0.]), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

我怀疑该错误与维度 (50,1) 的样本数据和仅维度 (1,1) 的回归量有关。尽管如此，一段时间以来我一直未能纠正我的代码中的错误。有人可以提供如何纠正错误的提示吗？感谢您的帮助！

【问题讨论】：

标签： python machine-learning linear-regression theano

【解决方案1】：

你需要广播b:

self.y_pred = T.dot(input, self.W) + self.b[:, None]

我希望 Theano 会自动执行此操作，但似乎并非如此。

要定位问题，请按照错误消息建议并以高异常详细度运行 Theano

$ THEANO_FLAGS='exception_verbosity=high' python path/to/script.py

这会产生相当多的输出，包括有问题的节点及其操作数

Debugprint of the apply node:
Elemwise{Composite{((-i0) + i1)}}[(0, 1)] [@A] <TensorType(float64, vector)> ''
 |b [@B] <TensorType(float64, vector)>
 |CGemv{no_inplace} [@C] <TensorType(float64, vector)> ''
   |<TensorType(float64, vector)> [@D] <TensorType(float64, vector)>
   |TensorConstant{-1.0} [@E] <TensorType(float64, scalar)>
   |<TensorType(float64, matrix)> [@F] <TensorType(float64, matrix)>
   |W [@G] <TensorType(float64, vector)>
   |TensorConstant{1.0} [@H] <TensorType(float64, scalar)>

该节点对应于从临时节点CGemv{no_inplace} 中减去b。唯一涉及b的代码行是

self.y_pred = T.dot(input, self.W) + self.b

【讨论】：

非常感谢您的回复。你的建议解决了这个问题。您能否详细说明您的答案？您是如何从错误消息中意识到我没有正确传播偏见的？
我已经用详细信息更新了答案。请注意，原始错误消息包含所有相同的信息，包括不匹配的形状：(1, ) 和 (50, )。