计算最常见的值答案

【问题标题】：Calculate most common values计算最常见的值
【发布时间】：2009-12-04 12:24:14
【问题描述】：

如果我有一个矩阵 A，其 n 值跨越 65:90。我如何获得 A 中最常见的 10 个值？我希望结果是一个 10x2 矩阵 B，第一列中有 10 个常见值，第二列中出现的次数。

【问题讨论】：

【解决方案1】：

A = [65 82 65 90; 90 70 72 82]; % Your data
range = 65:90;
res = [range; histc(A(:)', range)]'; % res has values in first column, counts in second.

现在您要做的就是将res 数组按第二列排序并取前 10 行。

sortedres = sortrows(res, -2); % sort by second column, descending
first10 = sortedres(1:10, :)

【讨论】：

你会想要使用 values(:) 因为 A 是一个矩阵而不是一个向量。此外，通过使用 sortrows(res, -2) 可以轻松地对 sortrows 进行降序排序。

【解决方案2】：

这很容易使用 arrayfun() 解决

A = [...]; % Your target matrix with values 65:90
labels = 65:90 % Possible values to look for
nTimesOccured = arrayfun(@(x) sum(A(:) == x), labels);
[sorted sortidx] = sort(nTimesOccured, 'descend');

B = [labels(sortidx(1:10))' sorted(1:10)'];

【讨论】：

【解决方案3】：

我们可以使用统计工具箱中的 tabulate 添加第四个选项：

A = randi([65 90], [1000 1]);   %# thousand random integers in the range 65:90
t = sortrows(tabulate(A), -2);  %# compute sorted frequency table
B = t(1:10, 1:2);               %# take the top 10

【讨论】：

【解决方案4】：

哎呀，这是另一种解决方案，所有简单的内置命令

[V, I] = unique(sort(A(:)));
M = sortrows([V, diff([0; I])], -2);
Top10 = M(1:10, :);

第一行：对所有值进行排序，然后在排序列表中查找每个新值开始的偏移量。第二行：计算每个唯一值的偏移差异，并对这些结果进行排序。

顺便说一句，如果可能的数字范围非常大，例如 [0,1E8]，我只会建议使用此方法。在这种情况下，其他一些方法可能会出现内存不足错误。

【讨论】：

【解决方案5】：

这也可以用 accumarray 解决

ncounts = accumarray(A(:),1);  %ncounts should now be a 90 x 1 vector of counts
[vals,sidx] = sort(ncounts,'descend');   %vals has the counts, sidx has the number
B = [sidx(1:10),vals(1:10)];

accumarray 没有达到应有的速度，但通常比其类型的其他操作快。我花了很多次扫描它的帮助页面来了解它到底在做什么。出于您的目的，它可能比 histc 解决方案慢，但更直接一些。

--edit：在 accumarray 调用中忘记了“1”。

【讨论】：

这不是使用 accumarray 的正确方法！看看 Doug Hull 的这个视频，它展示了该函数的典型用法：blogs.mathworks.com/videos/2009/10/02/basics-using-accumarray
是的，我忘记了1。然而，这是accumarray的本质。我认为它是一种快速、定义明确的输出方式（idx）+= vals。尽管有您的 cmets，但这是使用 accumarray 的正确方法。