多变量梯度下降Matlab - 这两个代码有什么区别？答案

【问题标题】：Multivariable gradient descent Matlab - what is the difference between the two codes?多变量梯度下降Matlab - 这两个代码有什么区别？
【发布时间】：2019-01-21 17:23:51
【问题描述】：

以下函数使用梯度下降找到回归线的最佳“thetas”。输入 (X,y) 附加在下面。我的问题是代码1和代码2有什么区别？为什么代码 2 可以工作，而代码 1 不能工作？

提前致谢！

GRADIENTDESCENTMULTI 执行梯度下降来学习 theta，它通过 num_iters 梯度步长和学习率 alpha 来更新 theta

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta);
J_history = zeros(num_iters, 1);
costs = zeros(n,1);

for iter = 1:num_iters
    % code 1 - doesn't work 
    for c = 1:n
        for i = 1:m
            costs(c) = costs(c)+(X(i,:)*theta - y(i))*X(i,c);
        end  
    end

    % code 2 - does work
    E = X * theta - y;
    for c = 1:n
        costs(c) = sum(E.*X(:,c));
    end

    % update each theta
    for c = 1:n
        theta(c) = theta(c) - alpha*costs(c)/m;
    end
    J_history(iter) = computeCostMulti(X, y, theta);    
end
end

function J = computeCostMulti(X, y, theta)

for i=1:m
    J = J+(X(i,:)*theta - y(i))^2;
end
J = J/(2*m);

运行代码：

alpha = 0.01;
num_iters = 200; 

% Init Theta and Run Gradient Descent 
theta = zeros(3, 1);
[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);

% Plot the convergence graph
figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');

% Display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n', theta);
fprintf('\n');

X 是

1.0000    0.1300   -0.2237
1.0000   -0.5042   -0.2237
1.0000    0.5025   -0.2237
1.0000   -0.7357   -1.5378
1.0000    1.2575    1.0904
1.0000   -0.0197    1.0904
1.0000   -0.5872   -0.2237
1.0000   -0.7219   -0.2237
1.0000   -0.7810   -0.2237
1.0000   -0.6376   -0.2237
1.0000   -0.0764    1.0904
1.0000   -0.0009   -0.2237
1.0000   -0.1393   -0.2237
1.0000    3.1173    2.4045
1.0000   -0.9220   -0.2237
1.0000    0.3766    1.0904
1.0000   -0.8565   -1.5378
1.0000   -0.9622   -0.2237
1.0000    0.7655    1.0904
1.0000    1.2965    1.0904
1.0000   -0.2940   -0.2237
1.0000   -0.1418   -1.5378
1.0000   -0.4992   -0.2237
1.0000   -0.0487    1.0904
1.0000    2.3774   -0.2237
1.0000   -1.1334   -0.2237
1.0000   -0.6829   -0.2237
1.0000    0.6610   -0.2237
1.0000    0.2508   -0.2237
1.0000    0.8007   -0.2237
1.0000   -0.2034   -1.5378
1.0000   -1.2592   -2.8519
1.0000    0.0495    1.0904
1.0000    1.4299   -0.2237
1.0000   -0.2387    1.0904
1.0000   -0.7093   -0.2237
1.0000   -0.9584   -0.2237
1.0000    0.1652    1.0904
1.0000    2.7864    1.0904
1.0000    0.2030    1.0904
1.0000   -0.4237   -1.5378
1.0000    0.2986   -0.2237
1.0000    0.7126    1.0904
1.0000   -1.0075   -0.2237
1.0000   -1.4454   -1.5378
1.0000   -0.1871    1.0904
1.0000   -1.0037   -0.2237

Y 是

【问题讨论】：

能否提供minimal working example。我看过你的代码，但我不完全确定是什么，所以当我只是猜测插入函数的东西时很难提供帮助。
@MarcinKonowalczyk 我刚刚编辑了问题（添加了工作代码）。该图显示了成本函数对迭代的值。

标签： matlab machine-learning octave gradient-descent

【解决方案1】：

我认为我的工作正常。主要的是，在代码 1 中，您不断添加 cost(c) 但在下一次迭代之前从未将其设置为零。您真正需要更改的唯一一件事是在for c = 1:n 之后和for i = 1:m 之前添加类似cost(c) = 0; 的内容。我确实必须稍微更改您的代码以使其对我有用（主要是computeCostMulti），并且我更改了图表以显示两种方法的结果相同。总的来说，这是一个带有这些更改的工作演示代码

close all; clear; clc;

%% Data
X = [1.0000  0.1300 -0.2237; 1.0000 -0.5042 -0.2237; 1.0000  0.5025 -0.2237; 1.0000 -0.7357 -1.5378;
    1.0000  1.2575  1.0904; 1.0000 -0.0197  1.0904; 1.0000 -0.5872 -0.2237; 1.0000 -0.7219 -0.2237;
    1.0000 -0.7810 -0.2237; 1.0000 -0.6376 -0.2237; 1.0000 -0.0764  1.0904; 1.0000 -0.0009 -0.2237;
    1.0000 -0.1393 -0.2237; 1.0000  3.1173  2.4045; 1.0000 -0.9220 -0.2237; 1.0000  0.3766  1.0904;
    1.0000 -0.8565 -1.5378; 1.0000 -0.9622 -0.2237; 1.0000  0.7655  1.0904; 1.0000  1.2965  1.0904;
    1.0000 -0.2940 -0.2237; 1.0000 -0.1418 -1.5378; 1.0000 -0.4992 -0.2237; 1.0000 -0.0487  1.0904;
    1.0000  2.3774 -0.2237; 1.0000 -1.1334 -0.2237; 1.0000 -0.6829 -0.2237; 1.0000  0.6610 -0.2237;
    1.0000  0.2508 -0.2237; 1.0000  0.8007 -0.2237; 1.0000 -0.2034 -1.5378; 1.0000 -1.2592 -2.8519;
    1.0000  0.0495  1.0904; 1.0000  1.4299 -0.2237; 1.0000 -0.2387  1.0904; 1.0000 -0.7093 -0.2237;
    1.0000 -0.9584 -0.2237; 1.0000  0.1652  1.0904; 1.0000  2.7864  1.0904; 1.0000  0.2030  1.0904;
    1.0000 -0.4237 -1.5378; 1.0000  0.2986 -0.2237; 1.0000  0.7126  1.0904; 1.0000 -1.0075 -0.2237;
    1.0000 -1.4454 -1.5378; 1.0000 -0.1871  1.0904; 1.0000 -1.0037 -0.2237];
y = [399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999,...
    699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900,...
    464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000,...
    287000 368500 329900 314000 299000 179900 299900 239500]';

alpha = 0.01;
num_iters = 200;

% Init Theta and Run Gradient Descent
theta0 = zeros(3, 1);
[theta_result_1, J_history_1] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 1);
[theta_result_2, J_history_2] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 2);

% Plot the convergence graph for both methods
figure;
x = 1:numel(J_history_1);
subplot(5,1,1:4);
plot(x,J_history_1,x,J_history_2);
xlim([min(x) max(x)]);
set(gca,'XTickLabel','');
ylabel('Cost J');
grid on;

subplot(5,1,5);
stem(x,(J_history_1-J_history_2)./J_history_1,'ko');
xlim([min(x) max(x)]);
xlabel('Number of iterations');
ylabel('frac. \DeltaJ');
grid on;

% Display gradient descent's result
fprintf('Theta computed from gradient descent with method 1: \n');
fprintf(' %f \n', theta_result_1);
fprintf('Theta computed from gradient descent with method 2: \n');
fprintf(' %f \n', theta_result_2);
fprintf('\n');

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters, METHOD)
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta);
J_history = zeros(num_iters, 1);

costs = zeros(n,1);
for iter = 1:num_iters

    if METHOD == 1 % code 1 - does work
        for c = 1:n
            costs(c) = 0;
            for i = 1:m
                costs(c) = costs(c) + (X(i,:)*theta - y(i)) *X(i,c);
            end
        end
    elseif METHOD == 2 % code 2 - does work
        E = X * theta - y;
        for c = 1:n
            costs(c) = sum(E.*X(:,c));
        end
    else
        error('unknown method');
    end

    % update each theta
    for c = 1:n
        theta(c) = theta(c) - alpha*costs(c)/m;
    end
    J_history(iter) = computeCostMulti(X, y, theta);
end
end

function J = computeCostMulti(X, y, theta)
m = length(y); J = 0;
for mi = 1:m
    J = J + (X(mi,:)*theta - y(mi))^2;
end
J = J/(2*m);
end

但是，你真的只需要添加 cost(c) = 0; 行。

还有；我建议始终在脚本开头添加 close all; clear; clc; 行，以确保在将它们复制并粘贴到堆栈溢出时它们能够正常工作。

【讨论】：

谢谢！我知道这将是一个小错误，但我自己却找不到。现在我明白错误出在哪里了。