当使用 numpy 数组手动复制不会失败时，为什么 Hypothesis 给出了一个伪造的例子？答案

【问题标题】：Why did Hypothesis give a falsifying example, when manually reproducing with numpy arrays does not fail?当使用 numpy 数组手动复制不会失败时，为什么 Hypothesis 给出了一个伪造的例子？
【发布时间】：2025-11-29 14:40:01
【问题描述】：

我试图编写我的第一个超简单的 numpy 测试用例，但我想到的第一件事似乎遇到了障碍。

所以我这样做了：

import numpy as np
from hypothesis import given
import hypothesis.strategies as hs
import hypothesis.extra.numpy as hxn
    
def tstind(a, i):
     i = max(i, 0)
     i = min(i, len(a)-1)
     return a[i]
     
@given(a=hxn.arrays(dtype=hxn.scalar_dtypes(),
       shape=hxn.array_shapes(max_dims=1)),
       i=hs.integers())
def test_tstind_typeconserve(a, i):
     assert tstind(a, i).dtype == a.dtype
     
test_tstind_typeconserve()

伪造例子：

test_tstind_typeconserve(
    a=array([0.], dtype=float16), i=0,
)

错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in test_tstind_typeconserve
  File "/tmp/persistent/miniconda3/envs/hypo/lib/python3.7/site-packages/hypothesis/core.py", line 1169, in wrapped_test
    raise the_error_hypothesis_found
  File "<stdin>", line 5, in test_tstind_typeconserve
AssertionError

但是：

a=np.array([0.], dtype=np.float16)
i=0
assert tstind(a, i).dtype == a.dtype

（即OK，不会失败）

顺便说一句，我期望找到的奇怪情况是这样的：

a=np.ma.masked_array([0.], mask=[1], dtype=np.float16)
a.dtype
dtype('float16')
a[0].dtype
dtype('float64')

【问题讨论】：

标签： python numpy python-hypothesis

【解决方案1】：

假设显示 Numpy 数据类型具有distinct byte orders。扩展您的测试，

    got = tstind(a, i).dtype
    assert got == a.dtype, (a.dtype.byteorder, got.byteorder)

AssertionError: ('>', '=') 对我来说失败了。不幸的是，数组对象的repr 不包含 dtype 字节顺序，但我们在这里。

（我已将此报告为issue 19059，值得一提）

【讨论】：

精彩的收获，谢谢！！令我震惊的是，如果假设“伪造示例”的生成依赖于 repr，我想它必须为一般性做，那么可能会有很多其他的区别会遗漏（而不仅仅是数组）。有没有更通用的方法来报告在失败案例中实际使用的输入假设？
Hypothesis 不依赖于 repr - 我们根据用于生成它们的选择序列来区分输入，并根据异常类型和位置来区分错误（递归地通过链式异常）。
要查看 Hypothesis 生成的所有示例，use verbosity=Verbosity.verbose 或传递 pytest -s --hypothesis-verbosity=verbose。
“假设不依赖于代表”——是的，我确定不是。我只是建议只使用“伪造示例生成”：I.E.您不能“专门化”数组示例的代码输出以包含所有可能的相关选项。如果有办法专门化它，那会很有趣，尽管我不知道我实际上可以生成比 numpy 提供的更好的数组。__repr__。
使用详细模式很有指导意义。但是您仍然看不到这个测试用例的关键所在。不过，断言中的信息输出效果很好。