【问题标题】:Custom function to compute mean absolute deviation计算平均绝对偏差的自定义函数
【发布时间】:2020-09-21 15:26:25
【问题描述】:

我有一个类似这样的 4D numpy 数组:

>>>import numpy as np
>>>from functools import partial

>>>X = np.random.rand(20, 1, 10, 4)

>>>X.shape
(20, 1, 10, 4)

我计算如下统计mean, median, std, p25, p75

>>>percentiles = tuple(partial(np.percentile, q=q) for q in (25,75))
>>>stat_functions = (np.mean, np.std, np.median) + percentiles

>>>stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

这样:

>>>stats.shape
(20, 1, 5, 4)

>>>stats[0]
array([[[0.55187202, 0.55892688, 0.45816177, 0.6378181 ],
        [0.31028278, 0.32109677, 0.17319351, 0.13341651],
        [0.57112019, 0.60587194, 0.45490572, 0.59787335],
        [0.30857011, 0.30367621, 0.28899686, 0.55742753],
        [0.80678815, 0.82014851, 0.61295181, 0.70529412]]])

我对统计中的mad感兴趣,所以我定义了这个函数,因为它不适用于numpy。

def mad(data):
    mean = np.mean(data)
    f = lambda x: abs(x - mean)
    vf = np.vectorize(f)
    return (np.add.reduce(vf(data))) / len(data)

但是我在让这个函数工作时遇到了问题:首先我尝试了:

>>>stat_functions = (np.mean, np.std, np.median, mad) + percentiles
>>>stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-33-fa6d972f0fce> in <module>()
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

<ipython-input-33-fa6d972f0fce> in <listcomp>(.0)
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

TypeError: mad() got an unexpected keyword argument 'axis'

然后我将mad的定义修改为:

def mad(data, axis=None):
    ...

进入这个问题:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-c74d9e3d057b> in <module>()
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

<ipython-input-35-c74d9e3d057b> in <listcomp>(.0)
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

TypeError: mad() got an unexpected keyword argument 'keepdims'

所以也这样做:

def mad(data, axis=None, keepdims=None):
    ...

让我进入:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-c74d9e3d057b> in <module>()
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimension(s) and the array at index 3 has 3 dimension(s)

我知道这与维度问题有关,但我不确定在这种情况下如何解决它。

*编辑:

根据给出的答案,我在使用答案的mad函数后得到了一个奇怪的结果,像这样:

stat_functions = (np.mean, np.std, np.median,mad) + percentiles

stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

stats.shape
(20, 1, 15, 4)

预期的输出应该具有(20,1,6,4) 的形状,因为我在第三维中添加了一个统计值:(np.mean, np.std, np.median, mad) + percentiles

EDIT-2

使用答案中的这个函数:

def mad(data, axis=-1, keepdims=True):
    return np.abs(data - data.mean(axis, keepdims=True)).mean(axis)

然后:

stat_functions = (np.mean, np.std, np.median, mad) + percentiles

stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

然后遇到这个:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-fa6d972f0fce> in <module>()
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimension(s) and the array at index 3 has 3 dimension(s)

【问题讨论】:

    标签: python numpy multidimensional-array numpy-ndarray


    【解决方案1】:

    我注意到您的代码vf 绝不是矢量化函数(请参阅Numpy's doc 中的注释。您可以只使用np.abs 而不是abs,您的函数将被矢量化。

    p>

    也就是说,你的函数可以写成:

    def mad(data):
        return np.abs(data - data.mean(0))/ len(data)
    

    现在,请注意,这个 mad 函数或您的原始函数只接受 一个 位置参数和 no 可选参数。你得到的错误是因为你试图将axis=2 传递给mad

    [func(X, axis=2, keepdims=True) for func in stat_functions]
    

    要解决此问题,请使用可选参数构建函数:

    def mad(data, axis=-1, keepdims=True):
        return np.abs(data - data.mean(axis, keepdims=keepdims)).sum(axis)/len(data)
    

    或者使用mean(axis) 可能比使用sum(axis)/len(data) 更有意义

    def mad(data, axis=-1, keepdims=True):
        return np.abs(data - data.mean(axis, keepdims=True)).mean(axis)
    

    【讨论】:

    • 谢谢,但是我希望stats 的第三维从(20,1,5,4) 增加到(20,1,6,4),因为我添加了一个统计值madstats.shape 是现在(20,1,15,4)stange!
    • 而且函数看起来不像返回ABSOLUTE值!!!
    • @arilwan 查看添加了np.abssum(axis) 的更新。
    • 感谢您的宝贵时间,然后我又收到了 keepdims 参数的错误。 TypeError: mad() got an unexpected keyword argument 'keepdims'
    • 当我将mad 添加到我的统计功能时会发生这种情况:stat_functions = (np.mean, np.std, np.median, mad) + percentiles
    猜你喜欢
    • 2015-07-03
    • 1970-01-01
    • 2021-04-19
    • 1970-01-01
    • 2013-12-21
    • 2022-10-30
    • 2022-06-27
    • 1970-01-01
    • 2020-07-27
    相关资源
    最近更新 更多