【问题标题】:Are weights/biases only updated once per mini-Batch?每个 mini-Batch 是否只更新一次权重/偏差?
【发布时间】:2019-12-04 05:43:43
【问题描述】:

我正在学习神经网络教程,我对更新权重的函数有疑问。

def update_mini_batch(self, mini_batch, eta):
    """Update the network's weights and biases by applying
    gradient descent using backpropagation to a single mini batch.
    The "mini_batch" is a list of tuples "(x, y)", and "eta"
    is the learning rate."""
    nabla_b = [np.zeros(b.shape) for b in self.biases]                #Initialize bias matrix with 0's
    nabla_w = [np.zeros(w.shape) for w in self.weights]               #Initialize weights matrix with 0's
    for x, y in mini_batch:                                           #For tuples in one mini_batch
        delta_nabla_b, delta_nabla_w = self.backprop(x, y)            #Calculate partial derivatives of bias/weights with backpropagation, set them to delta_nabla_b
        nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] #Generate a list with partial derivatives of bias of every neuron
        nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #Generate a list with partial derivatives of weights for every neuron
    self.weights = [w-(eta/len(mini_batch))*nw                        #Update weights according to update rule
                    for w, nw in zip(self.weights, nabla_w)]          #What author does is he zips 2 lists with values he needs (Current weights and partial derivatives), then do computations with them.
    self.biases = [b-(eta/len(mini_batch))*nb                         #Update biases according to update rule
                   for b, nb in zip(self.biases, nabla_b)]

我在这里不明白的是使用了一个 for 循环来计算 nabla_b 和 nabla_w(权重/偏差的偏导数)。对小批量中的每个训练示例进行反向传播,但只更新权重/偏差一次

在我看来,假设我们有一个大小为 10 的小批量,我们计算 nabla_b 和 nabla_w 10 次,然后在 for 循环完成后更新权重和偏差。但是 for 循环不是每次都重置 nabla_b 和 nabla_b 列表吗?为什么我们不更新self.weightsself.biases inside for 循环?

神经网络运行良好,所以我认为我在某个地方犯了一个小错误。

仅供参考:我正在关注的教程的相关部分可以找到here

【问题讨论】:

    标签: python neural-network gradient-descent stochastic-gradient


    【解决方案1】:

    了解此循环如何增加每个训练示例的偏差和权重的关键是注意evaluation order in Python。具体来说,= 符号右侧的所有内容都会在赋值给 = 符号左侧的变量之前进行评估。

    这是一个更简单的例子,可能更容易理解:

    nabla_b = [0, 0, 0, 0, 0]
    for x in range(10):
        delta_nabla_b = [-1, 2, -3, 4, -5]
        nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
    

    在这个例子中,我们只有五个标量偏差和一个恒定梯度。在这个循环结束时,nabla_b 是什么?考虑使用zip 的定义扩展的理解,并记住= 符号右侧的所有内容在写入左侧的变量名称之前都会进行评估:

    nabla_b = [0, 0, 0, 0, 0]
    for x in range(10):
        # nabla_b is defined outside of this loop
        delta_nabla_b = [-1, 2, -3, 4, -5]
    
        # expand the comprehension and the zip() function
        temp = []
        for i in range(len(nabla_b)):
            temp.append(nabla_b[i] + delta_nabla_b[i])
    
        # now that the RHS is calculated, set it to the LHS
        nabla_b = temp
    

    此时应该清楚nabla_b 的每个元素都与推导式中delta_nabla_b 的每个对应元素相加,结果将覆盖nabla_b 以进行循环的下一次迭代。

    因此,在本教程示例中,nabla_bnabla_w 是偏导数的总和,其中添加了一个梯度小批量中的每个训练示例一次。从技术上讲,它们会针对每个训练示例进行重置,但它们会重置为之前的值加上梯度,这正是您想要的。一种更清晰(但不太简洁)的写法可能是:

    def update_mini_batch(self, mini_batch, eta):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            # expanding the comprehensions
            for i in range(len(nabla_b)):
                nabla_b[i] += delta_nabla_b[i]      # set the value of each element directly
            for i in range(len(nabla_w)):
                nabla_w[i] += delta_nabla_w[i]
        self.weights = [w-(eta/len(mini_batch))*nw  # note that this comprehension uses the same trick
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]
    

    【讨论】:

    • 完美答案!现在完全明白了。 Tyvm =)
    【解决方案2】:

    不,更新发生在批次结束后,依次应用每个训练更新。规范描述说我们计算所有更新的平均值并根据该平均值进行调整;反过来,通过每次更新进行调整在算术上是等效的。

    首先,初始化偏差和权重数组。

    nabla_b = [np.zeros(b.shape) for b in self.biases]                #Initialize bias matrix with 0's
    nabla_w = [np.zeros(w.shape) for w in self.weights]               #Initialize weights matrix with 0's
    

    对于迷你比赛中的每个观察, 将训练结果插入到偏差和权重数组中

    for x, y in mini_batch:                                           #For tuples in one mini_batch
        delta_nabla_b, delta_nabla_w = self.backprop(x, y)            #Calculate partial derivatives of bias/weights with backpropagation, set them to delta_nabla_b
        nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] #Generate a list with partial derivatives of bias of every neuron
        nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #Generate a list with partial derivatives of weights for every neuron
    

    最后,调整每个权重和偏差,依次调整每个训练结果的值。

    self.weights = [w-(eta/len(mini_batch))*nw                        #Update weights according to update rule
                    for w, nw in zip(self.weights, nabla_w)]          #What author does is he zips 2 lists with values he needs (Current weights and partial derivatives), then do computations with them.
    self.biases = [b-(eta/len(mini_batch))*nb                         #Update biases according to update rule
                   for b, nb in zip(self.biases, nabla_b)]
    

    【讨论】:

      猜你喜欢
      • 2019-12-29
      • 2016-05-02
      • 1970-01-01
      • 1970-01-01
      • 2017-07-15
      • 1970-01-01
      • 1970-01-01
      • 2017-11-13
      • 2018-03-05
      相关资源
      最近更新 更多