numpy数组中的整数溢出答案

【问题标题】：Integer overflow in numpy arraysnumpy数组中的整数溢出
【发布时间】：2010-12-30 13:14:36
【问题描述】：

import numpy as np
a = np.arange(1000000).reshape(1000,1000)
print(a**2)

通过这段代码，我得到了这个答案。为什么我会得到负值？

[[         0          1          4 ...,     994009     996004     998001]
 [   1000000    1002001    1004004 ...,    3988009    3992004    3996001]
 [   4000000    4004001    4008004 ...,    8982009    8988004    8994001]
 ..., 
 [1871554624 1873548625 1875542628 ..., -434400663 -432404668 -430408671]
 [-428412672 -426416671 -424420668 ..., 1562593337 1564591332 1566589329]
 [1568587328 1570585329 1572583332 ..., -733379959 -731379964 -729379967]]

【问题讨论】：

标签： python numpy

【解决方案1】：

在您的平台上，np.arange 返回一个 dtype 'int32' 的数组：

In [1]: np.arange(1000000).dtype
Out[1]: dtype('int32')

数组的每个元素都是一个 32 位整数。平方导致不适合 32 位的结果。结果被裁剪为 32 位，但仍被解释为 32 位整数，这就是您看到负数的原因。

编辑：在这种情况下，您可以通过在平方前构造一个 dtype 'int64' 的数组来避免整数溢出：

a=np.arange(1000000,dtype='int64').reshape(1000,1000)

请注意，您发现的问题是使用 numpy 时的固有危险。您必须谨慎选择数据类型，并事先知道您的代码不会导致算术溢出。为了速度，numpy不能也不会在发生这种情况时警告你。

请参阅 http://mail.scipy.org/pipermail/numpy-discussion/2009-April/041691.html 以在 numpy 邮件列表上对此进行讨论。

【讨论】：

【解决方案2】：

这个问题的解决方法如下（取自here）：

...StringConverter._mapper (numpy/lib/_iotools.py) 类中的更改来自：

{{{
 _mapper = [(nx.bool_, str2bool, False),
            (nx.integer, int, -1),
            (nx.floating, float, nx.nan),
            (complex, _bytes_to_complex, nx.nan + 0j),
            (nx.string_, bytes, asbytes('???'))]
}}}

到

{{{
 _mapper = [(nx.bool_, str2bool, False),
            (nx.int64, int, -1),
            (nx.floating, float, nx.nan),
            (complex, _bytes_to_complex, nx.nan + 0j),
            (nx.string_, bytes, asbytes('???'))]
 }}}

这解决了我使用numpy.genfromtxt 时遇到的类似问题

请注意，作者将此描述为“临时”和“非最佳”解决方案。但是，我使用 v2.7 没有任何副作用（还没有？！）。

【讨论】：

【解决方案3】：

python 整数没有这个问题，因为它们溢出时会自动升级为 python 长整数。

因此，如果您确实设法溢出 int64，一种解决方案是在 numpy 数组中使用 python int：

import numpy
a=numpy.arange(1000,dtype=object)
a**20

【讨论】：

【解决方案4】：

numpy 整数类型是固定宽度的，您会看到整数溢出的结果。

【讨论】：