梯度下降始终欠拟合答案

【问题标题】：Gradient descent underfits consistently梯度下降始终欠拟合
【发布时间】：2020-08-28 18:02:21
【问题描述】：

我尝试在训练集中拟合 6 次多项式，但一直失败。它不适合。我使用了以下代码，

% X is feature scaled 
% Y is feature scaled too
function [J,grad,h]=costFunction(theta,X,Y,lambda)
  % theta is a (n+1)*1 vectorize
  % X is a m*(n+1) vector
  % Y is a m*1 vector
  h = X*theta;
  % h is a m*1 vector
  theta_r=[0;theta(2:end,:)];
  J=sum([h-Y].^2)+(lambda/(2*length(Y)))*theta_r'*theta_r;
  grad=zeros(length(theta),1);
  for j=1:length(theta)
    grad(j)=(1/length(Y))*sum((h-Y).*X(:,j));
  endfor
endfunction

function [cost_history,theta]=gradientDescent(theta,X,Y,alpha,num_iter,lambda)
  cost_history=zeros(num_iter,1);
  for i=1:num_iter
    [cost,grad,hyp]=costFunction(theta,X,Y,lambda);
    theta=theta*(1-((alpha*lambda)/length(Y)))-(alpha*grad);
    cost_history(i)=cost;
  endfor
endfunction

initial_theta=ones(size(X)(1,2),1);

[c_history,theta]=gradientDescent(initial_theta,X,Y,1,10000,0);

plot(X*theta,cx);  % cx is non-scaled feature X
hold on;
plot(cx,cy);       % cx,cy contain datasets

你能找到任何问题吗？

【问题讨论】：

您是否尝试过调整正则化参数和学习率？
好吧，让我多试几次，然后我会回复先生。
@TathagataDey，你写了plot(X*theta,cx);。我觉得应该是plot(cx,X*theta);。

标签： machine-learning regression octave gradient-descent

【解决方案1】：

我发现了一些问题：

在costFunction 中，您忘记将J 除以2*length(Y)。

在gradientDescent 中，您对不应正则化的 theta(0) 进行了正则化。

这是更正后的代码：

function [J,grad,h]=costFunction(theta,X,Y,lambda)
  % theta is a (n+1)*1 vectorize
  % X is a m*(n+1) vector
  % Y is a m*1 vector
  h = X*theta;
  % h is a m*1 vector
  theta_r=[0;theta(2:end)];
  J=sum([h-Y].^2)/(2*length(Y))+(lambda/(2*length(Y)))*theta_r'*theta_r;
  grad=zeros(length(theta),1);
  for j=1:length(theta)
    grad(j)=(1/length(Y))*sum((h-Y).*X(:,j));
  endfor
endfunction

function [cost_history,theta]=gradientDescent(theta,X,Y,alpha,num_iter,lambda)
  cost_history=zeros(num_iter,1);
  for i=1:num_iter
    [cost,grad,hyp]=costFunction(theta,X,Y,lambda);
    theta(1)=theta(1)-(alpha*grad(1));
    theta(2:end)=theta(2:end)*(1-((alpha*lambda)/length(Y)))-(alpha*grad(2:end));
    cost_history(i)=cost;
  endfor
endfunction

initial_theta=ones(size(X, 2),1);

[c_history,theta]=gradientDescent(initial_theta,X,Y,1,10000,0);

plot(cx, X*theta);  % cx is non-scaled feature X
hold on;
plot(cx,cy);       % cx,cy contain datasets

我还建议您将c_history 绘制为num_iter 的函数，以查看梯度下降是否收敛。如果不是，但它仍然减少，则意味着您需要增加迭代次数。如果每次迭代都增加，则需要减少alpha。而如果收敛但代价高，说明lambda太高或多项式次数太少（结果你的模型会欠拟合数据）。

我的意思是情节应该是plot(c_history);。

希望它能解决您的问题！

【讨论】：