在没有“for”循环的情况下使匹配和附加代码更高效答案

【问题标题】：Making a match-and-append code more efficient without 'for' loop在没有“for”循环的情况下使匹配和附加代码更高效
【发布时间】：2025-12-28 21:50:10
【问题描述】：

我正在尝试将A 的第 1^st 列与B 的第 1 到第 3^rd 列匹配，并附加相应的第 4^{th sup> B 到 A 的列。}

例如，

我比较A(:,1)和B(:, 1:3)

1 和 3 在 A(:,1) 中

1 在B(:, 1:3) 的第 1^st、2^nd、3^rd 行中，因此将 B([1 2 3], 4:end)' 附加到A 的第 1^st 行。 3 位于B(:,1:3) 的第 2^nd 和 4^th 行，因此将 B([2 4], 4:end)' 附加到 A 的第 2^nd 行。

这样就变成了：

1 2 5 4 5 3 1 2
3 4 5 3 6 5 0 0

我可以只使用for 和if 来编写代码。

clearvars AA A B mem mem2 mem3

A = [1 2 ; 3 4]
B = [1 2 4 5 4; 1 2 3 5 3; 1 1 1 1 2; 3 4 5 6 5]

for n=1:1:size(A,1)
    mem  = ismember(B(:,[1:3]), A(n,1));
    mem2 = mem(:,1) + mem(:,2) + mem(:,3);
    mem3 = find(mem2>0);

    AA{n,:} = horzcat( A(n,:), reshape(B(mem3,[4,5])',1,[]) );  %'
end

maxLength = max(cellfun(@(x)numel(x),AA));
out = cell2mat(cellfun(@(x)cat(2,x,zeros(1,maxLength-length(x))),AA,'UniformOutput',false))

我试图通过不使用for 和if 来提高这段代码的效率，但找不到答案。

【问题讨论】：

A 或 B 中可以有零吗？
在AA（循环内的最后一行）的定义中，您应该使用4:end 而不是[4,5]。而且您的代码运行得非常快/高效。如果没有找到更快的解决方案，建议保留它......没有理由避免循环，因为没有循环的更快解决方案。
@TheMinion：他的循环体中存在ismember的问题，这意味着JIT不能有效地加速这个循环。对于更大的问题，这将成为一个问题。
@RodyOldenhuis 是的。因此问题不是for-loop，而是循环内的ismember()。尽管如此，当我运行他的代码和来自 Nishant 的代码时，即使是 10.000x100 条目，他的速度也是最小的。所以不确定ismember() 的“问题”是否真的会导致这样的运行时问题。 BTW 不错的解决方案 +1

标签： matlab

【解决方案1】：

试试这个

a = A(:,1);
b = B(:,1:3);
z = size(b);
b = repmat(b,[1,1,numel(a)]);
ab = repmat(permute(a,[2,3,1]),z);
row2 = mat2cell(permute(sum(ab==b,2),[3,1,2]),ones(1,numel(a)));
AA = cellfun(@(x)(reshape(B(x>0,4:end)',1,numel(B(x>0,4:end)))),row2,'UniformOutput',0);
maxLength = max(cellfun(@(x)numel(x),AA));
out = cat(2,A,cell2mat(cellfun(@(x)cat(2,x,zeros(1,maxLength-length(x))),AA,'UniformOutput',false)))

更新下面的代码几乎与迭代代码在同一时间运行

a = A(:,1);
b = B(:,1:3);
z = size(b);
b = repmat(b,[1,1,numel(a)]);
ab = repmat(permute(a,[2,3,1]),z);
df = permute(sum(ab==b,2),[3,1,2])';
AA = arrayfun(@(x)(B(df(:,x)>0,4:end)),1:size(df,2),'UniformOutput',0);
AA = arrayfun(@(x)(reshape(AA{1,x}',1,numel(AA{1,x}))),1:size(AA,2),'UniformOutput',0);    
maxLength = max(arrayfun(@(x)(numel(AA{1,x})),1:size(AA,2)));
out2 = cell2mat(arrayfun(@(x,i)((cat(2,A(i,:),AA{1,x},zeros(1,maxLength-length(AA{1,x}))))),1:numel(AA),1:size(A,1),'UniformOutput',0));

【讨论】：

很好地找到了解决方案，但是在检查运行时（运行代码 1000 次）大约需要 2 秒，而 OP 的 for, if 解决方案只需要 1.5 秒。
@Nishant 谢谢！虽然正如 Minion 所说，它并没有提高速度，但我可以学到很多关于 repmat 的知识。

【解决方案2】：

这个怎么样：

%# example data
A = [1 2
     3 4];

B = [1 2 4 5 4
     1 2 3 5 3
     1 1 1 1 2
     3 4 5 6 5];

%# rename for clarity & reshape for algorithm's convenience
needle   = permute(A(:,1), [2 3 1]);
haystack = B(:,1:3);
data     = B(:,4:end).';

%# Get the relevant rows of 'haystack' for each entry in 'needle'
inds = any(bsxfun(@eq, haystack, needle), 2);

%# Create data that should be appended to A
%# All data and functionality in this loop is local and static, so speed 
%# should be optimal.
append = zeros( size(A,1), numel(data) );
for ii = 1:size(inds,3)    
    newrow = data(:,inds(:,:,ii));
    append(ii,1:numel(newrow)) = newrow(:);    
end

%# Now append to A, stripping unneeded zeros
A = [A append(:, ~all(append==0,1))]

【讨论】：