具有 20 个元素的 Numpy 矩阵乘法问题答案

【问题标题】：Numpy matrix multiplication issue with 20 elements具有 20 个元素的 Numpy 矩阵乘法问题
【发布时间】：2022-06-10 19:57:01
【问题描述】：

我正在使用矩阵乘法方法将 True 和 False 的位置检索到数组中；这是必要的，因为我不能使用 for 外观（我有数千条记录）。流程如下：

import numpy as np
# Create a test array
test_array = np.array([[False, True, False, False, False, True]])
# Create a set of unique "tens", each one identifying a position
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
# Multiply the matrix
print(int(np.dot(test_array, uniq_tens)[0]))
100010

必须从右到左读取 10010（0=False、1=True、0=False、0=False、1=True）。一切正常，除非 test_array 包含 20 个元素。

# This works fine - Test with 21 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
21
111000000000000000010

# This works fine - Test with 19 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
19
1000000000000000010

# This does not work - Test with 20 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
20
10000000000000000000

我使用 numpy 版本 1.16.4/1.19.4 和 1.19.5 进行了测试。你能帮我理解为什么吗？我担心其他数字也会发生这种情况，而不仅仅是 20。

非常感谢您的帮助！

【问题讨论】：

或者，使用np.where 检索 True 值的索引。
您的数字可能变得太大，并且您受到数字不精确的打击。 np.dot 操作的结果是float64，精度有限；大约在 15 到 17 位有效数字之间。小于 20。
为什么它适用于 19 和 21？我猜只是机会。您可能会发现这种情况发生的根本原因，但总体而言，您的算法很快就会变得不准确，因此不应使用。
np.dot(test_array, uniq_tens) 中有些奇怪的地方。对于 19，dtype 是 int64，对于 20 -> float64，对于 21 -> object。
可以解释int64和float64；对象类型令人惊讶，但也表明这超出了 numpy 所能承受的精度。

标签： python numpy matrix precision multiplication

【解决方案1】：

您正在达到 int64 限制：

print(len(str(2 ** (64 - 1))))
# 19

计算uniq_tens时。

【讨论】：

21 个元素的数组呢？

【解决方案2】：

我已经测试了您的代码，确实看起来错误是由在 np.dot 函数之后获得的浮点精度引起的。您可以将其转换回 int，但由于您将浮点数作为中间步骤，因此转换效果不佳。此外，它适用于长度为 18 和 19 的事实纯属巧合 - 我已经为其他 test_arrays 测试过它并在那里出错。

【讨论】：