【问题标题】:Gradient Descent with multiple variable without Matrix不带矩阵的多变量梯度下降
【发布时间】:2016-01-16 09:03:50
【问题描述】:

我是 Matlab 和机器学习的新手,我尝试在不使用矩阵的情况下制作梯度下降函数。

  • m 是我的训练集上的示例数
  • n 是每个示例的特征数

函数 gradientDescentMulti 有 5 个参数:

  • X mxn 矩阵
  • y m 维向量
  • theta : n 维向量
  • alpha:一个实数
  • nb_iters:一个实数

我已经有了使用矩阵乘法的解决方案

function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
  for iter = 1:num_iters
    gradJ = 1/m * (X'*X*theta - X'*y);
    theta = theta - alpha * gradJ;
  end
end

迭代后的结果:

theta =
   1.0e+05 *

    3.3430
    1.0009
    0.0367

但是现在,我尝试在没有矩阵乘法的情况下做同样的事情,这是函数:

function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
  m = length(y); % number of training examples
  n = size(X, 2); % number of features

  for iter = 1:num_iters
    new_theta = zeros(1, n);
    %// for each feature, found the new theta
    for t = 1:n
      S = 0;
      for example = 1:m
        h = 0;
        for example_feature = 1:n
          h = h + (theta(example_feature) * X(example, example_feature));
        end
        S = S + ((h - y(example)) * X(example, n)); %// Sum each feature for this example
      end
      new_theta(t) = theta(t) - alpha * (1/m) * S; %// Calculate new theta for this example
    end 
    %// only at the end of the function, update all theta simultaneously
    theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
  end
end

结果,所有的theta都是一样的:/

theta =
   1.0e+04 *

    3.5374
    3.5374
    3.5374

【问题讨论】:

    标签: matlab matrix machine-learning gradient-descent


    【解决方案1】:

    如果您查看梯度更新规则,首先实际计算所有训练示例的假设可能更有效,然后将其与每个训练示例的真实值相减并将其存储到数组或向量中.一旦你这样做了,你就可以很容易地计算更新规则。对我来说,您的代码中似乎没有这样做。

    因此,我重写了代码,但我有一个单独的数组,用于存储每个训练示例的假设和真实值的差异。完成此操作后,我会分别计算每个功能的更新规则:

    for iter = 1 : num_iters
    
        %// Compute hypothesis differences with ground truth first
        h = zeros(1, m);
        for t = 1 : m
            %// Compute hypothesis
            for tt = 1 : n
                h(t) = h(t) + theta(tt)*X(t,tt);
            end
            %// Compute difference between hypothesis and ground truth
            h(t) = h(t) - y(t);
        end
    
        %// Now update parameters
        new_theta = zeros(1, n);    
        %// for each feature, find the new theta
        for tt = 1 : n
            S = 0;
            %// For each sample, compute products of hypothesis difference
            %// and the right feature of the sample and accumulate
            for t = 1 : m
                S = S + h(t)*X(t,tt);
            end
    
            %// Compute gradient descent step
            new_theta(tt) = theta(tt) - (alpha/m)*S;
        end
    
        theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)    
    
    end
    

    当我这样做时,我得到的答案与使用矩阵公式相同。

    【讨论】:

      猜你喜欢
      • 2017-06-05
      • 2013-10-29
      • 2018-03-28
      • 1970-01-01
      • 2013-05-25
      • 2016-07-21
      • 2013-10-13
      • 2017-04-15
      • 1970-01-01
      相关资源
      最近更新 更多