非负矩阵分解：交替最小二乘法答案

【问题标题】：Nonnegative Matrix Factorization: The Alternating Least Squares Method非负矩阵分解：交替最小二乘法
【发布时间】：2015-12-13 06:29:28
【问题描述】：

我正在尝试使用交替最小二乘法实现 NMF。我只是对以下问题的基本实现感到好奇：

如果我理解正确，我们可以在没有非负约束的情况下求解此伪代码中所述的每个矩阵方程，使用封闭形式的解并将负条目设置为 0，以蛮力的方式。这种理解正确吗？这是更复杂、受约束的优化问题的基本替代方案，例如，我们使用投影梯度下降？更重要的是，如果以这种基本方式实现，算法是否有实用价值？我想使用 NMF 来减少变量，使用 NMF 很重要，因为我的数据根据定义是非负的。我正在寻找对此的意见。

【问题讨论】：

不确定这个的 C# 标签...
我即将用C#实现它；但在我的问题中确实提到了。
当然，您也可以使用非负最小二乘法，或者使用带有正约束的 glmnet 来获得稀疏正则化。 Cichocki 等人关于非负矩阵和张量分解的书提供了许多不同的算法，包括比这个简单的 ALS 更好的算法......

标签： matlab matrix linear-algebra matrix-factorization

【解决方案1】：

如果我理解正确，我们可以在没有非负性约束的情况下求解此伪代码中所述的每个矩阵方程，采用封闭形式解并将负项设置为 0，以蛮力方式。这种理解正确吗？ 是的。
这是更复杂、受约束的优化问题的基本替代方案，例如，我们使用投影梯度下降？ ---从某种意义上说，是的。这确实是一种快速的非负因式分解方法。然而，与 NMF 相关的文章会指出，这种方法虽然速度快，但并不能保证非负因子的收敛。更好的实现是 NMF 的分层交替最小二乘法 (HALS-NMF)。查看这篇论文，比较一些流行的 NMF 算法：http://www.cc.gatech.edu/~hpark/papers/jgo.pdf
更重要的是，如果按照这种基本的方式实现，算法会有实用价值吗？根据我的经验，我会说结果不如 HALS 或 BPP（Block Pivoting Principle）。

【讨论】：

【解决方案2】：

在这个算法中使用非负最小二乘法而不是剪掉负值显然在这个算法中会更好，但总的来说我不推荐这种基本的 ALS/ANNLS 方法，因为它的收敛特性很差（它经常波动或可能甚至显示分歧）- 更好的方法的最小 Matlab 实现，NMF 的加速分层交替最小二乘法（Cichocki 等人的），这是目前最快的方法之一，如下所示（Nicolas Gillis 的代码）：

% Accelerated hierarchical alternating least squares (HALS) algorithm of
% Cichocki et al. 
%
% See N. Gillis and F. Glineur, "Accelerated Multiplicative Updates and 
% Hierarchical ALS Algorithms for Nonnegative Matrix Factorization”, 
% Neural Computation 24 (4), pp. 1085-1105, 2012. 
% See http://sites.google.com/site/nicolasgillis/ 
%
% [U,V,e,t] = HALSacc(M,U,V,alpha,delta,maxiter,timelimit)
%
% Input.
%   M              : (m x n) matrix to factorize
%   (U,V)          : initial matrices of dimensions (m x r) and (r x n)
%   alpha          : nonnegative parameter of the accelerated method
%                    (alpha=0.5 seems to work well)
%   delta          : parameter to stop inner iterations when they become
%                    inneffective (delta=0.1 seems to work well). 
%   maxiter        : maximum number of iterations
%   timelimit      : maximum time alloted to the algorithm
%
% Output.
%   (U,V)    : nonnegative matrices s.t. UV approximate M
%   (e,t)    : error and time after each iteration, 
%               can be displayed with plot(t,e)
%
% Remark. With alpha = 0, it reduces to the original HALS algorithm.  

function [U,V,e,t] = HALSacc(M,U,V,alpha,delta,maxiter,timelimit)

% Initialization
etime = cputime; nM = norm(M,'fro')^2; 
[m,n] = size(M); [m,r] = size(U);
a = 0; e = []; t = []; iter = 0; 

if nargin <= 3, alpha = 0.5; end
if nargin <= 4, delta = 0.1; end
if nargin <= 5, maxiter = 100; end
if nargin <= 6, timelimit = 60; end

% Scaling, p. 72 of the thesis
eit1 = cputime; A = M*V'; B = V*V'; eit1 = cputime-eit1; j = 0;
scaling = sum(sum(A.*U))/sum(sum( B.*(U'*U) )); U = U*scaling; 
% Main loop
while iter <= maxiter && cputime-etime <= timelimit
    % Update of U
    if j == 1, % Do not recompute A and B at first pass
        % Use actual computational time instead of estimates rhoU
        eit1 = cputime; A = M*V'; B = V*V'; eit1 = cputime-eit1; 
    end
    j = 1; eit2 = cputime; eps = 1; eps0 = 1;
    U = HALSupdt(U',B',A',eit1,alpha,delta); U = U';
    % Update of V
    eit1 = cputime; A = (U'*M); B = (U'*U); eit1 = cputime-eit1;
    eit2 = cputime; eps = 1; eps0 = 1; 
    V = HALSupdt(V,B,A,eit1,alpha,delta); 
    % Evaluation of the error e at time t
    if nargout >= 3
        cnT = cputime;
        e = [e sqrt( (nM-2*sum(sum(V.*A))+ sum(sum(B.*(V*V')))) )]; 
        etime = etime+(cputime-cnT);
        t = [t cputime-etime];
    end
    iter = iter + 1; j = 1; 
end

% Update of V <- HALS(M,U,V)
% i.e., optimizing min_{V >= 0} ||M-UV||_F^2 
% with an exact block-coordinate descent scheme
function V = HALSupdt(V,UtU,UtM,eit1,alpha,delta)
[r,n] = size(V); 
eit2 = cputime; % Use actual computational time instead of estimates rhoU
cnt = 1; % Enter the loop at least once
eps = 1; eps0 = 1; eit3 = 0;
while cnt == 1 || (cputime-eit2 < (eit1+eit3)*alpha && eps >= (delta)^2*eps0)
    nodelta = 0; if cnt == 1, eit3 = cputime; end
        for k = 1 : r
            deltaV = max((UtM(k,:)-UtU(k,:)*V)/UtU(k,k),-V(k,:));
            V(k,:) = V(k,:) + deltaV;
            nodelta = nodelta + deltaV*deltaV'; % used to compute norm(V0-V,'fro')^2;
            if V(k,:) == 0, V(k,:) = 1e-16*max(V(:)); end % safety procedure
        end
    if cnt == 1
        eps0 = nodelta; 
        eit3 = cputime-eit3; 
    end
    eps = nodelta; cnt = 0; 
end

完整代码以及与其他方法的比较，请参见 https://sites.google.com/site/nicolasgillis/code （NMF 的加速 MU 和 HALS 算法部分）和 N. Gillis and F. Glineur, "Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization”, Neural Computation 24 (4), pp. 1085-1105, 2012.

【讨论】：

【解决方案3】：

是的，可以这样做，但不，你不应该这样做。

NMF 的瓶颈不是非负最小二乘计算，而是最小二乘方程右侧的计算和损失计算（如果用于确定收敛）。根据我的经验，使用快速 NNLS 求解器，与基本最小二乘求解相比，NNLS 增加了不到 1% 的相对运行时间。现在（也许当你问这个问题时不是）有非常快速的方法，例如 TNT-NN 和顺序坐标下降，它们使事情变得非常快。

我试过这个方法，模型质量真的很差。它几乎不会让人想起 HALS 或乘法更新。

【讨论】：