具有不同 dtype 的结构化（记录）数组上的 ufunc（最小值、最大值、平均值等）答案

【问题标题】：ufunc (min, max, mean, etc) on structured (record) arrays with different dtype具有不同 dtype 的结构化（记录）数组上的 ufunc（最小值、最大值、平均值等）
【发布时间】：2026-02-14 08:50:01
【问题描述】：

我正在 Python(3.8) 和 numpy(1.20.3) 中工作，并尝试在具有不同数据类型的结构化数组上执行简单的函数。

def test_large_record():
    x = numpy.array([0.0, 0.2, 0.3], dtype=numpy.float)
    x_2 = numpy.array([0.01, 0.12, 0.82], dtype=numpy.float)
    y = numpy.array([1, 5, 7], dtype=numpy.int)
    rec_array = numpy.rec.fromarrays([x, x_2, y], dtype=[('x', '<f8'), ('x_2', '<f8'), ('y', '<i8')])

    print(rec_array.min())

这会导致“TypeError: cannot perform reduce with flexible type”。

我尝试创建一些东西，然后通过通用结构化数组并返回具有相同数据类型的每个字段数组的生成视图....但这似乎不起作用。

def rec_homogeneous_generator(rec_array):
    dtype = {}

    for name, dt in rec_array.dtype.descr:
        if dt not in dtype.keys():
            dtype[dt] = []

        dtype[dt].append(name)

    for dt, cols in dtype.items():
        r = rec_array[cols]
        v = r.view(dt)
        yield v


def test_large_record():
    x = numpy.array([0.0, 0.2, 0.3], dtype=numpy.float)
    x_2 = numpy.array([0.01, 0.12, 0.82], dtype=numpy.float)
    y = numpy.array([1, 5, 7], dtype=numpy.int)
    rec_array = numpy.rec.fromarrays([x, x_2, y], dtype=[('x', '<f8'), ('x_2', '<f8'), ('y', '<i8')])

    for h_array in rec_homogeneous_generator(rec_array):
        print(h_array.min(axis=0))

这导致 0.0 和 0 这不是我所期望的。我应该得到 [0, 0.01] 和 1。

有人有什么好主意吗？

【问题讨论】：

你检查过h_array吗？为什么不只计算每个字段而不按 dtype 分组？
注意多字段索引中的view。在最近的 numpy 版本中，多字段索引会生成一个 view，所有字段仍然存在，即使它们被“删除”。
我拥有的数据非常大，因此迭代每个字段可能是较慢的选择。我宁愿让 numpy 做这个提升。我还了解到，这种观点很难做到……似乎是一个奇怪的函数结果。
如果字段数与记录数相比较小，则对字段进行迭代还不错。大多数recfunctions 都这样做。

标签： python python-3.x numpy numpy-ufunc recarray

【解决方案1】：

一次操作一个字段：

In [21]: [rec_array[field].min() for field in rec_array.dtype.fields]
Out[21]: [0.0, 0.01, 1]

在最近的 numpy 版本中使用多字段索引

In [23]: list(rec_homogeneous_generator(rec_array))
Out[23]: 
[rec.array([0.0e+000, 1.0e-002, 4.9e-324, 2.0e-001, 1.2e-001, 2.5e-323,
            3.0e-001, 8.2e-001, 3.5e-323],
           dtype=float64),
 rec.array([                  0, 4576918229304087675,                   1,
            4596373779694328218, 4593311331947716280,                   5,
            4599075939470750515, 4605561122934164029,                   7],
           dtype=int64)]

多字段索引：

In [25]: rec_array[['x','x_2']]
Out[25]: 
rec.array([(0. , 0.01), (0.2, 0.12), (0.3, 0.82)],
          dtype={'names':['x','x_2'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':24})

更好地处理多字段索引：

In [26]: import numpy.lib.recfunctions as rf
In [28]: rf.repack_fields(rec_array[['x','x_2']])
Out[28]: 
rec.array([(0. , 0.01), (0.2, 0.12), (0.3, 0.82)],
          dtype=[('x', '<f8'), ('x_2', '<f8')])

现在我们可以改为浮动：

In [29]: rf.repack_fields(rec_array[['x','x_2']]).view(float)
Out[29]: 
rec.array([0.  , 0.01, 0.2 , 0.12, 0.3 , 0.82],
          dtype=float64)

这个view是1d。

或者更好：

In [30]: rf.structured_to_unstructured(rec_array[['x','x_2']])
Out[30]: 
rec.array([[0.  , 0.01],
           [0.2 , 0.12],
           [0.3 , 0.82]],
          dtype=float64)

这些函数记录在structured array 页面上。

【讨论】：

structured_to_unstructured 正是我所需要的。我忘记提到的另一个要求是防止复制，看起来它会执行它。谢谢！
只有单字段索引才能避免复制。