如何在距离 x 处找到与给定数字最近的 n 个数字？答案

【问题标题】：How do I find the closest n numbers to a given number at x distance from it?如何在距离 x 处找到与给定数字最近的 n 个数字？
【发布时间】：2016-06-05 09:19:22
【问题描述】：

例如我有一个像

这样的数字列表

lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]

我需要距离给定数字 3 处的 2 个数字，比如说 5，所以输出列表应该是这样的

output_lst = [2, 8]

这里的距离是指数字行上的距离，而不是列表索引中的距离。所以 3 个数字，2 距离 5 会给出

output_lst = [3,3,7]

我厌倦了像这样使用 heapq 中的 nsmallest

check_number = 5

output_lst = nsmallest(3, lst, key=lambda x: abs(x - check_number))

但这里的问题是我不知道指定距离。它只会输出最接近 5 的 3 个数字。

[4,4,5]

【问题讨论】：

标签： python list python-3.x

【解决方案1】：

您可以为此使用list comprehension。有关列表推导的更多信息，请参阅this post。

>>> lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> given_numer = 5
>>> distance = 3
>>> [i for i in lst if abs(i-given_numer)==distance]
[2, 8]

逻辑很简单，我们只看每个数字与给定数字之差的绝对值，如果是则返回该值。同样

>>> distance = 2
>>> [i for i in lst if abs(i-given_numer)==distance]
[3, 3, 7]

让我们稍微复杂一点，尝试使用filter 和闭包。代码是：

_{只是为了表明它是另一种选择}。

def checkdistance(given_number,distance):
    def innerfunc(value):
        return abs(value-given_number)==distance
    return innerfunc


lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
given_number = 5
distance = 3
checkdistance3from5 = checkdistance(5,3)
list(filter(checkdistance3from5,lst))

【讨论】：

@MaxU 建议的解决方案（子集数组）更短，但我对效率和其他可能遗漏的因素有疑问。与您建议的列表理解相比，对数组进行子集化是否在某种程度上更成问题或更慢？
@Jason Numpy 对于较大的列表总是更好。这仅适用于初学者级别。 here 也可以这样说，Max 使用 Pandas 的方法更快。总而言之，如果你有一个小列表并且更关心简单基础 python 更好。
@Jason，我同意 Bhargav Rao 的观点。对于较小的列表，列表理解很可能会更快
@BhargavRao 谢谢！那工作得很好。我的列表有不到 20 个元素，所以列表理解是比 numpy 数组更好的选择。

【解决方案2】：

numpy 方法：

import numpy as np

check_number = 5
distance = 3
a = np.array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
a[np.absolute(a - check_number) == distance]

检查：

In [46]: a
Out[46]: array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])

In [47]: a[np.absolute(a-5) == 3]
Out[47]: array([2, 8])

不同大小的数组/列表的时间（ms 1/1000 秒）：

In [141]: df
Out[141]:
           numpy  list_comprehension
size
10        0.0242              0.0148
20        0.0248              0.0179
30        0.0254              0.0219
50        0.0267              0.0288
100       0.0292              0.0457
1000      0.0712              0.3210
10000     0.4290              3.3700
100000    3.8900             33.6000
1000000  46.4000            343.0000

情节：

大小 df[df.index<=1000].plot.bar()) 的数组的条形图：

代码：

def np_approach(n, check_number=5, distance=3):
    a = np.random.randint(0,100, n)
    return a[np.absolute(a - check_number) == distance]

def list_comprehension(n, check_number=5, distance=3):
    lst = np.random.randint(0,100, n).tolist()
    return [i for i in lst if abs(i-check_number)==distance]

In [102]: %timeit list_comprehension(10**2)
10000 loops, best of 3: 45.7 ┬╡s per loop

In [103]: %timeit np_approach(10**2)
10000 loops, best of 3: 29.2 ┬╡s per loop

In [104]: %timeit list_comprehension(10**3)
1000 loops, best of 3: 321 ┬╡s per loop

In [105]: %timeit np_approach(10**3)
The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 71.2 ┬╡s per loop

In [106]: %timeit list_comprehension(10**4)
100 loops, best of 3: 3.37 ms per loop

In [107]: %timeit np_approach(10**4)
1000 loops, best of 3: 429 ┬╡s per loop

In [108]: %timeit list_comprehension(10**5)
10 loops, best of 3: 33.6 ms per loop

In [109]: %timeit np_approach(10**5)
100 loops, best of 3: 3.89 ms per loop

In [110]: %timeit list_comprehension(10**6)
1 loop, best of 3: 343 ms per loop

In [111]: %timeit np_approach(10**6)
10 loops, best of 3: 46.4 ms per loop

In [112]: %timeit list_comprehension(50)
10000 loops, best of 3: 28.8 ┬╡s per loop

In [113]: %timeit np_approach(50)
10000 loops, best of 3: 26.7 ┬╡s per loop

In [118]: %timeit list_comprehension(40)
The slowest run took 6.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.89 ┬╡s per loop

In [119]: %timeit np_approach(40)
The slowest run took 8.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.2 ┬╡s per loop

In [120]: %timeit list_comprehension(30)
10000 loops, best of 3: 21.9 ┬╡s per loop

In [121]: %timeit np_approach(30)
10000 loops, best of 3: 25.4 ┬╡s per loop

In [122]: %timeit list_comprehension(20)
100000 loops, best of 3: 17.9 ┬╡s per loop

In [123]: %timeit np_approach(20)
10000 loops, best of 3: 24.8 ┬╡s per loop

In [124]: %timeit list_comprehension(10)
100000 loops, best of 3: 14.8 ┬╡s per loop

In [125]: %timeit np_approach(10)
10000 loops, best of 3: 24.2 ┬╡s per loop

结论：对于较大的列表，numpy 方法比列表理解方法更快，对于非常小的列表（少于 50 个元素），它可能是其他方式

【讨论】：

@BhargavRao，我已经更新了我的答案。顺便说一句，你知道如何在 numpy 解决方案中避免 caching 吗？
对不起，不是numpy（也不是 IPython）的人。相关帖子是Completely disable IPython output caching。并感谢您的更新