【问题标题】：What is the most efficient way of doing square root of sum of square of two numbers?计算两个数的平方和的平方根最有效的方法是什么？
【发布时间】：2018-12-07 05:07:20
【问题描述】：

我正在寻找一种更有效、更快捷的方法来计算两个或多个数字的平方和的平方根。我实际上正在使用numpy 和这段代码：

np.sqrt(i**2+j**2)

这似乎比以下速度快五倍：

np.sqrt(sum(np.square([i,j])))

（i 和 j 是数字！）

我想知道是否已经有一个更高效的内置函数可以用更少的代码来执行这个非常常见的任务。

【问题讨论】：

我会一直选择numpy。类似于np.sqrt(np.sum(a*a))，其中a 是您的数字数组。
可能numpy.linalg.norm 是最有效的实现。另请参阅this answer which looks in detail at the performance。
如果您正在寻找最短的方法，请使用 np.linalg.norm。为了获得最佳性能，您可以使用 Cython、Numba 或 numexpr。例如。 stackoverflow.com/a/49868544/4045774 在更大的数组上，这个问题也可以很容易地并行化。
您是否使用此操作迭代多个点？
@IonicSolutions 在简单的情况下似乎更快(i*i + j*j)**0.5

标签： python performance numpy

【解决方案1】：

我知道你需要速度，但我想指出编写自己的 sqroot 计算器的一些错误

速度对比

%%timeit
math.hypot(i, j)
# 85.2 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%%timeit
np.hypot(i, j)
# 1.29 µs ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
np.sqrt(i**2+j**2)
# 1.3 µs ± 9.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
(i*i + j*j)**0.5
# 94 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

速度方面，两个 numpy 相同，但 hypot 非常安全。事实上(i*i + j*j)**0.5 溢出。 hypot 是有效的，是准确的感觉：p

此外，math.hypot 也非常安全和快速，可以处理 sqrs 和的 3d sqrt，并且比 (i*i + j*j)**0.5 更快

下溢

i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0

溢出

i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf

没有下溢

i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200

无溢出

i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200

【讨论】：

【解决方案2】：

我根据答案做了一些比较，似乎更快的方法是使用math 模块，然后使用math.hypot(i + j)，但可能最好的折衷方案是使用(i*i + j*j)**0.5，而不导入任何模块，即使不是那么明确。

代码

from timeit import timeit
import matplotlib.pyplot as plt


tests = [
"np.sqrt(i**2+j**2)",
"np.sqrt(sum(np.square([i,j])))",
"(i*i + j*j)**0.5",
"math.sqrt(i*i + j*j)",
"math.hypot(i,j)",
"np.linalg.norm([i,j])",
"ne.evaluate('sqrt(i**2+j**2)')",
"np.hypot(i,j)"]

results = []
lengths = []
for test in tests:
    results.append(timeit(test,setup='i = 7; j = 4;\
                          import numpy  as np; \
                          import math; \
                          import numexpr as ne', number=1000000))
    lengths.append(len(test))

indx = range(len(results))
plt.bar(indx,results)
plt.xticks(indx,tests,rotation=90)
plt.yscale('log')
plt.ylabel('Time (us)')

【讨论】：

只是好奇，如果i 和j 只是数字，你为什么关心一个大约需要 1 我们的操作的性能？
@Brenlla 你的观点是对的，但我只是喜欢按照我写的(i*i + j*j)**0.5 执行最佳程序，即使不是更快更适合这项任务。

【解决方案3】：

对于i != j 的情况，np.linalg.norm 无法做到这一点，因此我建议如下：

(i*i + j*j)**0.5

如果i 和j 是单浮点数，这比np.sqrt(i**2+j**2) 快大约5 倍。如果i 和j 是numpy 数组，这大约快20%（由于用i*i 和j*j 替换正方形。如果不替换正方形，则性能等于np.sqrt(i**2+j**2)。
一些使用单个浮点数的计时：

i = 23.7
j = 7.5e7
%timeit np.sqrt(i**2 + j**2)
# 1.63 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit (i*i + j*j)**0.5
# 336 ns ± 7.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit math.sqrt(i*i + j*j)
# 321 ns ± 8.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

math.sqrt 比 (i*i + j*j)**0.5 稍快，但是这是以失去灵活性为代价的：(i*i + j*j)**0.5 将适用于单个浮点数和数组，而 math.sqrt 将只适用于标量。

还有一些中型数组的时间安排：

i = np.random.rand(100000)
j = np.random.rand(100000)
%timeit np.sqrt(i**2 + j**2)
# 1.45 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit (i*i + j*j)**0.5
# 1.21 ms ± 78.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

【讨论】：

我会接受它，因为当没有其他答案来评估没有压力时，拜托！
当然，只是想知道，如果我的回答中还有什么可以尝试实现的。
在我自己的测试中，math.sqrt() 比 np.sqrt()、x**0.5、pow(x,0.5) 和 math.pow(x,0.5) 快。如果将其导入为from math import sqrt 并直接调用sqrt()，它会变得更快。
@DillonDavis 但是math.sqrt() 不能将数组作为输入，因此您失去了很多灵活性。为了获得完整的答案，我在我的答案中添加了 math.sqrt 的案例。但是用from math import sqrt 导入应该不会对性能产生影响！你的时间肯定有问题。
GM 在之前的评论中说 i 和 j 是整数，而不是数组。这就是为什么我在 for 循环中只使用整数来完成我的计时。关于导入差异，这是因为 math.sqrt 必须执行两次查找（数学和 sqrt），而另一个只有一个。详情请见this answer。

【解决方案4】：

在这种情况下，numexpr 模块可能会更快。该模块避免了中间缓冲，因此对于某些操作来说更快：

i = np.random.rand(100000)
j = np.random.rand(100000)
%timeit np.sqrt(i**2 + j**2)
# 1.34 ms

import numexpr as ne
%timeit ne.evaluate('sqrt(i**2+j**2)')
#370 us

【讨论】：

谢谢，但它比(i*i + j*j)**0.5慢
这将取决于您的阵列的大小。对于此操作，numexpr 使用大于 10,000-100,000 的数组变得更快
感谢您的澄清，但在我的问题中 i 和 j 是两个数字
如果使用numpy以外的其他模块，我强烈推荐numba。在大多数情况下，这将远远优于numexpr。

【解决方案5】：

您可以尝试重写您的程序，而不是优化这个相当简单的函数调用，以使 i 和 j 是数组而不是单个数字（假设您需要在许多不同的输入上调用该函数） .查看这个小基准：

import numpy as np
i = np.arange(10000)
j = np.arange(10000)

%%timeit 
np.sqrt(i**2+j**2)
# 74.1 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
for idx in range(len(i)):
    np.sqrt(i[idx]**2+j[idx]**2)
# 25.2 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

如您所见，第一个变体（使用数字数组作为输入）比使用 python for 循环的第二个变体快约 300 倍。这样做的原因是，在第一个示例中，所有计算都是由 numpy 执行的（它在内部用 c 实现，因此非常快），而在第二个示例中，numpy 代码和常规 python 代码（for 循环）交错，使得执行速度要慢得多。

如果您真的想提高程序的性能，我建议您重写它，以便您可以在两个 numpy 数组上执行一次函数，而不是为每对数字调用它。

【讨论】：