【问题标题】:How do I find the closest n numbers to a given number at x distance from it?如何在距离 x 处找到与给定数字最近的 n 个数字?
【发布时间】:2016-06-05 09:19:22
【问题描述】:

例如我有一个像

这样的数字列表
lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]

我需要距离给定数字 3 处的 2 个数字,比如说 5, 所以输出列表应该是这样的

output_lst = [2, 8]

这里的距离是指数字行上的距离,而不是列表索引中的距离。所以 3 个数字,2 距离 5 会给出

output_lst = [3,3,7]

我厌倦了像这样使用 heapq 中的 nsmallest

check_number = 5

output_lst = nsmallest(3, lst, key=lambda x: abs(x - check_number))

但这里的问题是我不知道指定距离。它只会输出最接近 5 的 3 个数字。

[4,4,5]

【问题讨论】:

    标签: python list python-3.x


    【解决方案1】:

    您可以为此使用list comprehension。有关列表推导的更多信息,请参阅this post

    >>> lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
    >>> given_numer = 5
    >>> distance = 3
    >>> [i for i in lst if abs(i-given_numer)==distance]
    [2, 8]
    

    逻辑很简单,我们只看每个数字与给定数字之差的绝对值,如果是则返回该值。同样

    >>> distance = 2
    >>> [i for i in lst if abs(i-given_numer)==distance]
    [3, 3, 7]
    

    让我们稍微复杂一点,尝试使用filter 和闭包。代码是:

    只是为了表明它是另一种选择

    def checkdistance(given_number,distance):
        def innerfunc(value):
            return abs(value-given_number)==distance
        return innerfunc
    
    
    lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
    given_number = 5
    distance = 3
    checkdistance3from5 = checkdistance(5,3)
    list(filter(checkdistance3from5,lst))
    

    【讨论】:

    • @MaxU 建议的解决方案(子集数组)更短,但我对效率和其他可能遗漏的因素有疑问。与您建议的列表理解相比,对数组进行子集化是否在某种程度上更成问题或更慢?
    • @Jason Numpy 对于较大的列表总是更好。这仅适用于初学者级别。 here 也可以这样说,Max 使用 Pandas 的方法更快。总而言之,如果你有一个小列表并且更关心简单基础 python 更好。
    • @Jason,我同意 Bhargav Rao 的观点。对于较小的列表,列表理解很可能会更快
    • @BhargavRao 谢谢!那工作得很好。我的列表有不到 20 个元素,所以列表理解是比 numpy 数组更好的选择。
    【解决方案2】:

    numpy 方法:

    import numpy as np
    
    check_number = 5
    distance = 3
    a = np.array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
    a[np.absolute(a - check_number) == distance]
    

    检查:

    In [46]: a
    Out[46]: array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
    
    In [47]: a[np.absolute(a-5) == 3]
    Out[47]: array([2, 8])
    

    不同大小的数组/列表的时间(ms 1/1000 秒):

    In [141]: df
    Out[141]:
               numpy  list_comprehension
    size
    10        0.0242              0.0148
    20        0.0248              0.0179
    30        0.0254              0.0219
    50        0.0267              0.0288
    100       0.0292              0.0457
    1000      0.0712              0.3210
    10000     0.4290              3.3700
    100000    3.8900             33.6000
    1000000  46.4000            343.0000
    

    情节:

    大小 df[df.index<=1000].plot.bar()) 的数组的条形图:

    代码:

    def np_approach(n, check_number=5, distance=3):
        a = np.random.randint(0,100, n)
        return a[np.absolute(a - check_number) == distance]
    
    def list_comprehension(n, check_number=5, distance=3):
        lst = np.random.randint(0,100, n).tolist()
        return [i for i in lst if abs(i-check_number)==distance]
    
    In [102]: %timeit list_comprehension(10**2)
    10000 loops, best of 3: 45.7 ┬╡s per loop
    
    In [103]: %timeit np_approach(10**2)
    10000 loops, best of 3: 29.2 ┬╡s per loop
    
    In [104]: %timeit list_comprehension(10**3)
    1000 loops, best of 3: 321 ┬╡s per loop
    
    In [105]: %timeit np_approach(10**3)
    The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 71.2 ┬╡s per loop
    
    In [106]: %timeit list_comprehension(10**4)
    100 loops, best of 3: 3.37 ms per loop
    
    In [107]: %timeit np_approach(10**4)
    1000 loops, best of 3: 429 ┬╡s per loop
    
    In [108]: %timeit list_comprehension(10**5)
    10 loops, best of 3: 33.6 ms per loop
    
    In [109]: %timeit np_approach(10**5)
    100 loops, best of 3: 3.89 ms per loop
    
    In [110]: %timeit list_comprehension(10**6)
    1 loop, best of 3: 343 ms per loop
    
    In [111]: %timeit np_approach(10**6)
    10 loops, best of 3: 46.4 ms per loop
    
    In [112]: %timeit list_comprehension(50)
    10000 loops, best of 3: 28.8 ┬╡s per loop
    
    In [113]: %timeit np_approach(50)
    10000 loops, best of 3: 26.7 ┬╡s per loop
    
    In [118]: %timeit list_comprehension(40)
    The slowest run took 6.61 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 9.89 ┬╡s per loop
    
    In [119]: %timeit np_approach(40)
    The slowest run took 8.87 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 10.2 ┬╡s per loop
    
    In [120]: %timeit list_comprehension(30)
    10000 loops, best of 3: 21.9 ┬╡s per loop
    
    In [121]: %timeit np_approach(30)
    10000 loops, best of 3: 25.4 ┬╡s per loop
    
    In [122]: %timeit list_comprehension(20)
    100000 loops, best of 3: 17.9 ┬╡s per loop
    
    In [123]: %timeit np_approach(20)
    10000 loops, best of 3: 24.8 ┬╡s per loop
    
    In [124]: %timeit list_comprehension(10)
    100000 loops, best of 3: 14.8 ┬╡s per loop
    
    In [125]: %timeit np_approach(10)
    10000 loops, best of 3: 24.2 ┬╡s per loop
    

    结论:对于较大的列表,numpy 方法比列表理解方法更快,对于非常小的列表(少于 50 个元素),它可能是其他方式

    【讨论】:

    • @BhargavRao,我已经更新了我的答案。顺便说一句,你知道如何在 numpy 解决方案中避免 caching 吗?
    • 对不起,不是numpy(也不是 IPython)的人。相关帖子是Completely disable IPython output caching。并感谢您的更新
    猜你喜欢
    • 2014-07-29
    • 1970-01-01
    • 1970-01-01
    • 2021-08-09
    • 2016-01-13
    • 2019-07-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多