numpy：计算多个非连续轴上的均值和标准差（第二次尝试）答案

【问题标题】：numpy: Computing mean and std over multiple non-consecutive axes (2nd attempt)numpy：计算多个非连续轴上的均值和标准差（第二次尝试）
【发布时间】：2012-01-04 17:51:34
【问题描述】：

[这篇文章的早期版本完全没有得到回应，因此，如果这是由于缺乏清晰度，我已经对其进行了重新设计，并提供了额外的解释和代码 cmets。]

我想计算一个 numpy n 维数组元素的平均值和标准偏差，这些数组不对应于单个轴（而是对应于 k > 1 非连续轴），并将结果收集到一个新的（n - k + 1）维数组中。

numpy 是否包含标准构造以有效地执行此操作？

下面复制的函数mu_sigma是我解决这个问题的最佳尝试，但它有两个明显的低效率：1）它需要复制原始数据； 2) 它计算平均值两次（因为计算标准差需要计算平均值）。

mu_sigma 函数有两个参数：box 和 axes。 box 是一个 n 维 numpy 数组（又名“ndarray”），axes 是一个 k 整数元组，表示（不一定是连续的）维度的box。该函数返回一个新的 (n - k + 1) 维 ndarray，其中包含在由 k 个指定轴。

下面的代码还包括一个mu_sigma 的示例。在此示例中，box 参数是一个 4 x 2 x 4 x 3 x 4 ndarray 浮点数，axes 参数是元组 (1, 3)。（因此，我们有 n == len(box.shape) == 5 和 k == len(axes) == 2。）结果（这里我称之为outbox）为此示例输入返回的是一个 4 x 4 x 4 x 2 ndarray 浮点数。对于索引 i、k、j 的每个三元组（其中每个索引的范围都在集合 {0、1、2、3} 上），元素 outbox[i, j, k, 0] 是 numpy 表达式 box[i, 0:2, j, 0:3, k] 指定的 6 个元素的平均值。同样，outbox[i, j, k, 1] 是相同 6 个元素的标准差。这意味着结果的第一个 n - k == 3 个维度在与 n - k 相同的索引上 输入 ndarray box 的非轴维度，在本例中为维度 0、2 和 4。

mu_sigma中使用的策略是

置换尺寸（使用transpose 方法），使函数的第二个参数中指定的轴都放在最后；其余（非轴）尺寸留在开头（按其原始顺序）；
将坐标区维度合并为一个（通过使用reshape 方法）；新的“折叠”维度现在是重塑的 ndarray 的最后一个维度；
使用最后一个“折叠”维度作为轴计算均值的 ndarray；
使用最后一个“折叠”维度作为轴计算标准差的 ndarray；
返回通过连接 (3) 和 (4) 中生成的 ndarray 获得的 ndarray

import numpy as np

def mu_sigma(box, axes):
    inshape = box.shape

    # determine the permutation needed to put all the dimensions given in axes
    # at the end (otherwise preserving the relative ordering of the dimensions)
    nonaxes = tuple([i for i in range(len(inshape)) if i not in set(axes)])

    # permute the dimensions
    permuted = box.transpose(nonaxes + axes)

    # determine the shape of the ndarray after permuting the dimensions and
    # collapsing the axes-dimensions; thanks to Bago for the "+ (-1,)"
    newshape = tuple(inshape[i] for i in nonaxes) + (-1,)

    # collapse the axes-dimensions
    # NB: the next line results in copying the input array
    reshaped = permuted.reshape(newshape)

    # determine the shape for the mean and std ndarrays, as required by
    # the subsequent call to np.concatenate (this reshaping is not necessary
    # if the available mean and std methods support the keepdims keyword;
    # instead, just set keepdims to True in both calls).
    outshape = newshape[:-1] + (1,)

    # compute the means and standard deviations
    mean = reshaped.mean(axis=-1).reshape(outshape)
    std = reshaped.std(axis=-1).reshape(outshape)

    # collect the results in a single ndarray, and return it
    return np.concatenate((mean, std), axis=-1)

inshape = 4, 2, 4, 3, 4
inbuf = np.array(map(float, range(np.product(inshape))))
inbox = np.ndarray(inshape, buffer=inbuf)
outbox = mu_sigma(inbox, tuple(range(len(inshape))[1::2]))

# "inline tests"
assert all(outbox[..., 1].ravel() ==
           [inbox[0, :, 0, :, 0].std()] * outbox[..., 1].size)
assert all(outbox[..., 0].ravel() == [float(4*(v + 3*w) + x)
                                      for v in [8*y - 1
                                                for y in [3*z + 1
                                                          for z in range(4)]]
                                      for w in range(4)
                                      for x in range(4)])

【问题讨论】：

这种方法对我来说似乎是正确的。 mean 比 std 快得多，我不用担心计算平均值两次。在使用 numpy/matlab 类型矢量化来制作临时数据副本时，这很常见。这是我们为 numpy 的可用性和速度付出的代价，除非你遇到某种内存限制，否则我不会担心。一个关于newshape的小笔记，试试newshape = tuple(inshape[i] for i in nonaxes) + (-1,)
@Bago：将 -1 作为最后一个元素的技巧是一个形状元组，非常棒！谢谢！
@kjo：顺便说一句，您不必重新发布您的问题。根据this Meta question，您只需对其进行编辑（使用有关您的进度的有用信息/更好的解释），它就会被撞到。

标签： python numpy

【解决方案1】：

从 numpy 2.0 开始，这似乎变得更容易了。

http://projects.scipy.org/numpy/ticket/1234

【讨论】：

该链接已损坏。