在 Python/Numba 中访问数组会产生奇怪的结果答案

【问题标题】：Accessing array in Python/Numba gives weird result在 Python/Numba 中访问数组会产生奇怪的结果
【发布时间】：2014-02-23 14:37:33
【问题描述】：

我正在尝试将 numpy 与 numba 一起使用，但在尝试使用转换为 int 的浮点索引访问或设置一些值到浮点数组时，我得到了奇怪的结果。检查这个基本功能。

@numba.jit("void(f8[:,::1],f8[:,::1])")
def test(table, index):
 x,y = int(index[0,0]), int(index[1,0)
 table[y,x] = 1.0
 print index[0,0], index[1,0], x,y
 print table
 print table[y,x]

table = np.zeros((5,5), dtype = np.float32)
index = np.random.ranf(((2,2)))*5
test(table, index)

结果：

index[0,0] = 1.34129550525 index[1,0] = 0.0656177324359 x = 1 y = 0    
table[0,1] = 1.0 
table [[ 0.     0.     1.875  0.     0.   ]
       [ 0.     0.     0.     0.     0.   ]
       [ 0.     0.     0.     0.     0.   ]
       [ 0.     0.     0.     0.     0.   ]
       [ 0.     0.     0.     0.     0.   ]]

为什么我的表中得到的是 1.875 而不是 1.0？这是一个基本示例，但我正在使用大数组，它给了我很多错误。我知道我可以将索引转换为 np.int32 并更改 @numba.jit("void(f8[:,::1],f8[:,::1])") 到 @numba.jit("void(f8[:,::1],i4[:,::1])") 效果很好，但我愿意像吨一样理解为什么这不起作用。将类型从python解析为c++时是否有问题？

感谢您的帮助

【问题讨论】：

jit中的f8声明和初始化的np.float32是不是有出入？ 1.875 不在 x=1 处。顺便说一句，为什么它被标记为 C++？
@Joky C++ 标签是一个错误，抱歉。是的，它与 np.float64 一起使用。但是对于像 1.0 这样的数字，float 32 或 float64 应该有所不同吗？
既然你标记了 C++ ;) ：问题是 table 是一个指向 double 的指针，指向一个 float 数组。为浮点数（大 2 倍）在位置 1 重叠位置 3/4 处写入双精度数，并且在两个浮点数之上编码双精度数没有意义。顺便说一句，这是一个猜测，因为我不确切知道 numba 生成的代码是什么。编辑：unutbu 解释清楚如下。

标签： python numpy casting numba

【解决方案1】：

In [198]: np.float64(1.0).view((np.float32,2))
Out[198]: array([ 0.   ,  1.875], dtype=float32)

所以当

table[y,x] = 1.0

将np.float64(1.0) 写入table，table 将数据视为np.float32 并将其解释为0 和1.875。

请注意，0 出现在索引位置[0,1]，1.875 出现在索引位置[0,2]，而分配发生在[y,x] = [0,1]。

您可以通过更改来修复 dtype 不匹配

@numba.jit("void(f8[:,::1],f8[:,::1])")

到

@numba.jit("void(f4[:,::1],f8[:,::1])")

这些是np.float64(1.0) 中的 8 个字节：

In [201]: np.float64(1.0).tostring()
Out[201]: '\x00\x00\x00\x00\x00\x00\xf0?'

当 4 个字节 '\x00\x00\xf0?' 被解释为 np.float32 你得到 1.875：

In [205]: np.fromstring('\x00\x00\xf0?', dtype='float32')
Out[205]: array([ 1.875], dtype=float32)

【讨论】：

非常感谢。也许这是一个愚蠢的问题，但为什么 np.float64 和 np.float32 之间存在这种差异？
np.float64 表示 64 位（8 字节）浮点数。 np.float32 表示 32 位（4 字节）浮点数。 np.float(1.0) 占用 8 个字节。如果将其写入table，则覆盖了 8 个字节。打印table 时，它将其基础数据解释为 4 字节浮点数，因此您编写的内容会影响 table 中的 2 个值（浮点数）。
好吧，我知道 float64 是 8 个字节，而 float32 是 4 个字节。但我认为字节数只会改变你可以写的最大数量。不是数字的书写方式。例如 np.int32(1).tostring() 给出 '\x01\x00\x00\x00' 和 np.int64(1).tostring() 给出 '\x01\x00\x00\x00\x00\x00\x00 \x00' 所以即使浮点数和整数不同，我认为 np.float64 会给出 '\x00\x00\x00\x00\x00\x00\x80?'
如果您在 here 表单中输入 3FF0000000000000，它将显示 32 位和 64 位表示。指数字段的长度不同，因此需要不同的位数为 1 来填充指数字段。作为参考，这里是 IEEE754 64bit 和 32bit 浮点格式的定义。
谢谢，这真的很有用！