在 Python 中查找第一个丢失的正整数答案

【问题标题】：Find the first missing positive integer in Python在 Python 中查找第一个丢失的正整数
【发布时间】：2016-12-10 01:36:13
【问题描述】：

我有以下任务：

给定一个未排序的整数数组，找到第一个丢失的正整数。您的算法应该在 O(n) 时间内运行并使用恒定空间。

想了想得到了提示，我决定把输入列表A改一下。代码如下：

def firstMissingPositive(A):
        m=max(A)
        ln=len(A)
        i=0
        while i<ln:
            if A[i]>=1 and A[i]<=ln:
                if A[A[i]-1]!=m+1:

                   A[A[i]-1], A[i] = m+1, A[A[i]-1]
                else:
                    i+=1

            else:
                i+=1
        for i in range(ln):
            if A[i]!=m+1:
                return i+1

当我运行它时，它需要很长时间。我该怎么做才能让它更快一点？

编辑：这是列表 A。

A=[ 417, 929, 845, 462, 675, 175, 73, 867, 14, 201, 777, 407, 80, 882, 785, 563, 209, 261, 776, 362, 730, 74, 649, 465, 353, 801, 503, 154, 998, 286, 520, 692, 68, 805, 835, 210, 819, 341, 564, 215, 984, 643, 381, 793, 726, 213, 866, 706, 97, 538, 308, 797, 883, 59, 328, 743, 694, 607, 729, 821, 32, 672, 130, 13, 76, 724, 384, 444, 884, 192, 917, 75, 551, 96, 418, 840, 235, 433, 290, 954, 549, 950, 21, 711, 781, 132, 296, 44, 439, 164, 401, 505, 923, 136, 317, 548, 787, 224, 23, 185, 6, 350, 822, 457, 489, 133, 31, 830, 386, 671, 999, 255, 222, 944, 952, 637, 523, 494, 916, 95, 734, 908, 90, 541, 470, 941, 876, 264, 880, 761, 535, 738, 128, 772, 39, 553, 656, 603, 868, 292, 117, 966, 259, 619, 836, 818, 493, 592, 380, 500, 599, 839, 268, 67, 591, 126, 773, 635, 800, 842, 536, 668, 896, 260, 664, 506, 280, 435, 618, 398, 533, 647, 373, 713, 745, 478, 129, 844, 640, 886, 972, 62, 636, 79, 600, 263, 52, 719, 665, 376, 351, 623, 276, 66, 316, 813, 663, 831, 160, 237, 567, 928, 543, 508, 638, 487, 234, 997, 307, 480, 620, 890, 216, 147, 271, 989, 872, 994, 488, 291, 331, 8, 769, 481, 924, 166, 89, 824, -4, 590, 416, 17, 814, 728, 18, 673, 662, 410, 727, 667, 631, 660, 625, 683, 33, 436, 930, 91, 141, 948, 138, 113, 253, 56, 432, 744, 302, 211, 262, 968, 945, 396, 240, 594, 684, 958, 343, 879, 155, 395, 288, 550, 482, 557, 826, 598, 795, 914, 892, 690, 964, 981, 150, 179, 515, 205, 265, 823, 799, 190, 236, 24, 498, 229, 420, 753, 936, 191, 366, 935, 434, 311, 920, 167, 817, 220, 219, 741, -2, 674, 330, 909, 162, 443, 412, 974, 294, 864, 971, 760, 225, 681, 689, 608, 931, 427, 687, 466, 894, 303, 390, 242, 339, 252, 20, 218, 499, 232, 184, 490, 4, 957, 597, 477, 354, 677, 691, 25, 580, 897, 542, 186, 359, 346, 409, 655, 979, 853, 411, 344, 358, 559, 765, 383, 484, 181, 82, 514, 582, 593, 77, 228, 921, 348, 453, 274, 449, 106, 657, 783, 782, 811, 333, 305, 784, 581, 746, 858, 249, 479, 652, 270, 429, 614, 903, 102, 378, 575, 119, 196, 12, 990, 356, 277, 169, 70, 518, 282, 676, 137, 622, 616, 357, 913, 161, 3, 589, 327 ]

【问题讨论】：

不能用sort或sorted吗？
OP 提到“应该在 O(n) 中运行”。
我们对数组还有什么了解？是否缺少正整数？只有一个吗？是否存在重复值？
如果输入已知为非负数？如果它类似于从 0 到 n 的混洗范围，并删除了一个数字，这是微不足道的，但您需要更清楚输入的已知质量。
我首先在if 的交换分支中添加print("Swapping %d"%i)。你从它的输出中学到了什么？

标签： python

【解决方案1】：

~~在我看来，在 O(n) 和常数空间中这样做是不可能的。~~ （我已纠正，Rockybilly 的链接给出了这样的解决方案）

要在恒定空间中进行，一种是强制对列表进行排序，对于大多数排序算法来说是 O(n log n)，而这里的算法看起来像插入排序，平均为 O(n²)

所以 FWIW 我选择丢弃常量空间，以便尽可能接近 O(n)在大 O 表示法中仍然是 O(n) )

def firstMissingSince(sequence, start=1):
    uniques = set()
    maxitem = start-1
    for e in sequence:
        if e >= start:
            uniques.add(e)
            if e > maxitem:
                maxitem = e
    return next( x for x in range(start, maxitem+2) if x not in uniques )

（下面是第 2 版）

我本可以使用 set(sequence) 和 max(sequence)，但两者都是 O(n)，所以我将它们组合在同一个循环中，我使用 set 有两个原因：首先是空间可能减少忽略重复项，同样我只关心大于或等于我的下限的数字（我也将其设为通用），其次是 O(1) 成员资格测试。

最后一行是对缺失元素的简单线性搜索，如果最大元素低于开始，则默认为开始，如果数组在开始和最大值之间没有缺失元素，则默认为最大值+1。

这里有一些测试是从其他答案中借用的......

assert 1 == firstMissingSince(A) 
assert 2 == firstMissingSince([1,4,3,6,5])
assert 2 == firstMissingSince([1,44,3,66,55]) 
assert 6 == firstMissingSince([1,2,3,4,5]) 
assert 4 == firstMissingSince([-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11])
assert 4 == firstMissingSince([18, 2, 13, 3, 3, 0, 14, 1, 18, 12, 6, -1, -3, 15, 11, 13, -8, 7, -8, -7])
assert 4 == firstMissingSince([-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11])
assert 3 == firstMissingSince([7, -7, 19, 6, -3, -6, 1, -8, -1, 19, -8, 2, 4, 19, 5, 6, 6, 18, 8, 17])

Rockybilly的回答让我意识到我根本不需要知道最大值，所以这里是第2版

from itertools import count

def firstMissingSince(sequence, start=1):
    uniques = set(sequence) # { x for x in sequence if x>=start } 
    return next( x for x in count(start) if x not in uniques )

【讨论】：

看来确实存在 O(n) 和恒定空间解决方案。它非常优雅，它使用值作为索引来更改原始列表。

【解决方案2】：

代码

def first_missing_positive(nums):
    bit = 0
    for n in nums:
        if n > 0:
            bit |= 1 << (n - 1)
    flag = 0
    while bit != 0:
        if (bit & 1 == 0):
            break
        flag += 1
        bit >>= 1
    return flag + 1

说明

通常，对于恒定的空间要求，按位解决方案非常好。

这里的技巧是传递所有整数并将它们的二进制表示形式存储在单个变量中。说“位”。例如当nums = [1, 2, 3] 即nums_bitwise = [1, 10, 11]， bit = "11"。 11 这里表示序列[1, 10, 11] 的压缩形式。

现在，我们假设 nums 中缺少 2。然后我们有nums = [1, 3]，即nums_bitwise = [1, 11]，bit = "101"。我们现在可以遍历“bit”变量的所有位来找出第一个丢失的正整数2，即“101”中的“0”

请注意，对于nums=[1, 3] 和nums=[1, 2, 3]，bit 的值将分别为5 和7。你需要为它的二进制表示做"{0:b}".format(bit)。

关键线

bit |= 1 << (n - 1)

将所有整数存储在nums 中，方法是将整数与bit 变量的默认0 进行左移、逐位和ORing。

接下来，我们做

if (bit & 1 == 0):
    break

在压缩的bit 变量中查找第一个零。第一个零表示第一个缺失的整数。右移一位 bit >>= 1 并查找该位是否丢失。如果不是，则将bit 变量的最后一位与1 保持与，直到结果为0。

分析

由于我们只查看nums 中的每个整数一次，因此时间复杂度为O(n)。假设所有整数都可以压缩在一个变量中，空间复杂度为O(1)，即常量空间。

【讨论】：

这不是 O(1) 空间复杂度，因为位变量的大小将随着列表中数字大小的增长而增长。这是 O(p)，其中 p 是列表中的最大数字。例如，如果数字2**35 在列表中，则此算法将在位变量中使用2**32 字节的内存，这很多。有趣的是，您的内存复杂度与输入的大小完全无关，我以前没有遇到过。

【解决方案3】：

FWIW，我是这样做的：

def firstMissingPositive(A):
    for i in range(len(A)):
        while A[i] != i+1 and 0 < A[i] < len(A):
            value = A[i]-1
            A[i], A[value] = A[value], A[i]
    for i, value in enumerate(A, 1):
        if i != value:
            return i
    return len(A)+1

assert firstMissingPositive([1,4,3,6,5]) == 2
assert firstMissingPositive([1,44,3,66,55]) == 2
assert firstMissingPositive([1,2,3,4,5]) == 6
assert firstMissingPositive(A) == 1

【讨论】：

value = A[i]-1 明确改进，使评估与作业分开。我没有意识到（或从未想过）这些 multple 分配是如何工作的 - 它非常微妙。
assert f([18, 2, 13, 3, 3, 0, 14, 1, 18, 12, 6, -1, -3, 15, 11, 13, -8, 7, -8, -7]) == 4?
@wwii - 是的，你是对的。我错误地认为不会有重复。

【解决方案4】：

可能不是长时间运行的全部原因，但我确实发现了一个会导致无限循环的错误。我首先创建了长度为 20 的随机整数数组。

a = [random.randint(-10, 20) for _ in range(20)]

添加了两个打印语句以查看发生了什么。

    while i<ln:
        print(A)
        if A[i]>=1 and A[i]<=ln:
            if A[A[i]-1]!=m+1:
                print("Swapping %d"%i)
                A[A[i]-1], A[i] = m+1, A[A[i]-1]
            else:
       ...

将此数组作为输入，您将进入无限循环：

a = [-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11]

>>>
...
[18, 18, -8, -10, -6, 6, 14, 18, -5, 18, 18, 15, 17, 18, 2, 7, 18, 18, 7, 11]
Swapping 5
[18, 18, -8, -10, -6, 6, 14, 18, -5, 18, 18, 15, 17, 18, 2, 7, 18, 18, 7, 11]
Swapping 5
[18, 18, -8, -10, -6, 6, 14, 18, -5, 18, 18, 15, 17, 18, 2, 7, 18, 18, 7, 11]
...

事实证明，如果A[A[i]-1] 等于A[i]，那么您最终总是将相同的数字放回A[i]。在这种情况下，i == 5、A[5] == 6 和 A[A[i]-1] == 6。在此声明中，

A[A[i]-1], A[i] = m+1, A[A[i]-1]

评估右侧； m+1 分配给A[5]；然后将 6 分配给 A[5]。我通过交换分配顺序来解决这个问题：

A[i], A[A[i]-1] = A[A[i]-1], m+1

使用您添加到问题中的列表，它现在会在我的 mod 中引发 IndexError。尽管首先评估右侧，但似乎左侧的A[A[i]-1] 直到在第一个分配完成并且大量已放置后才被评估 在A[i]。

抄袭Rob's solution - 在进行任何交换之前评估[A[i]-1：

def firstMissingPositive(A):
    m=max(A)
    ln=len(A)
    print('max:{}, len:{}'.format(m, ln))
    i=0
    while i<ln:
##        print(A[:20])
        if A[i]>=1 and A[i]<=ln:
            if A[A[i]-1]!=m+1:
##                print("Swapping %d"%i)
                v = A[i]-1
                A[i], A[v] = A[v], m+1
            else:
                i+=1
        else:
            i+=1
    for i in range(ln):
        if A[i]!=m+1:
            return i+1

而且它有时仍然会返回错误的结果，所以对我来说是减一。它会产生以下错误结果：

[18, 2, 13, 3, 3, 0, 14, 1, 18, 12, 6, -1, -3, 15, 11, 13, -8, 7, -8, -7]
[-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11]
[7, -7, 19, 6, -3, -6, 1, -8, -1, 19, -8, 2, 4, 19, 5, 6, 6, 18, 8, 17]

【讨论】：

【解决方案5】：

def firstMissingPositve(nums):
    if nums == []:
        return 1
    else:
        a = max(nums)
        for i in range(1 , a+2):
            if i not in nums:
                c = i
                return c

【讨论】：

欢迎来到 Stack Overflow，感谢您回答这个问题！为了最大限度地发挥这个答案的有用性，我建议编辑添加一些关于您的代码的作用和原因的上下文。

【解决方案6】：

def missingNumber(arr, n):
    x = {i for i in arr if i > 0}
    b = max(arr)
    for i in range(1, b + 2):
        if i not in x:
            return i

【讨论】：

我认为不需要函数参数 n。

【解决方案7】：

下面是我的python代码。 O(n) 时间和 O(1) 空间复杂度。受到@pmcarpan 在this related question 中的高级解决方案的启发。另请查看我的 github 上的 fully commented markdown version。

def lowestMissingStrictlyPositiveInteger(arr):
    """ Return the lowest missing strictly 
    positive integer from the array arr. 
    Warning: In order to achieve this in linear time and
    constant space, arr is modified in-place.
    
    Uses separatePositiveIntegers() to isolate all
    strictly positive integers, and marks their occurrence
    with markIndicesOfObservedPositiveIntegers(). This 
    function then scans the modified array for the 'marks'
    and returns the first unmarked value. """
    m = separatePositiveIntegers(arr)
    markIndicesOfObservedPositiveIntegers(arr, m)
    for i in range(m): #O(m)
        if arr[i]>0:
            # this index hasn't been marked by
            # markIndexOfObservedPositiveIntegers(), 
            # therefore the integer i+1 is missing.
            return i+1
    return m+1

def separatePositiveIntegers(arr):
    """ Modify input array in place, so that 
    strictly positive integers are
    all at the start of the array, 
    and negative integers are
    all at the end of the array. 
    
    Return the index of the first negative 
    integer in the updated array (or len(arr)
    if all values are positive). """
    i1, i2 = 0, len(arr)-1
    while (i2 > i1): #O(n)
        
        if arr[i2]<=0:
            # move to the first strictly positive value
            # starting from the end of the array.
            i2 -= 1
            continue
        
        if arr[i1]>0:
            # move to the first negative value
            # from the start of the array.
            i1 += 1
            continue
        
        # swap negative value at i1 with the first
        # strictly positive value starting from the
        # end of the array (i.e., at position i2).
        tmp = arr[i2]
        arr[i2] = arr[i1]
        arr[i1] = tmp
    
    return i1 if arr[i1]<=0 else i1+1

def markIndicesOfObservedPositiveIntegers(arr, m):
    """ Take an array arr of integer values, 
    where indices [0,m-1] are all strictly positive
    and indices >= m are all negative
    (see separatePositiveIntegers() method).
    
    Mark the occurrence of a strictly positive integer
    k<=m by assigning a negative sign to the value in 
    the array at index k-1 (modify in place)."""
    for i in range(m): #O(m)
        # all values at indices [0,m-1] are strictly positive 
        # to start with, but may have been  modified in-place 
        # (switched to negative sign) in this loop. 
        # Therefore, read the untampered value as abs(arr[i]).
        untampered_val=abs(arr[i])
        # We can safely ignore any untampered value strictly superior to m
        # because it guarantees a gap in the integer sequence at a lower value 
        # (since arr only has m strictly positive integers).
        if untampered_val<=m:
            # mark the integer as "seen" by
            # changing the sign of the value at
            # index untampered_val-1 to negative.
            arr[untampered_val-1] = -abs(arr[untampered_val-1])

# test 1
arr = [3, 4, -1, 1]
assert lowestMissingStrictlyPositiveInteger(arr) == 2

# test 2
arr = [2, 0, 1]
assert lowestMissingStrictlyPositiveInteger(arr) == 3

# test 3
arr = [0]
assert lowestMissingStrictlyPositiveInteger(arr) == 1

【讨论】：

【解决方案8】：

这是一个更好的答案不要介意长代码不符合我们需要质量而不是短代码

    def firstMissingPositive(self, nums):
        if 1 not in nums:
            return 1
        n = len(nums)
        for i in range(n):
            if nums[i] > n or nums[i] <= 0:
                nums[i] = 1
        for i in range(n):
            a = abs(nums[i])
            if a == n:
                nums[0] = -abs(nums[0])
            else:
                nums[a] = -abs(nums[a])
        for i in range(2, n):
            if nums[i] > 0: return i
        return n if nums[0] > 0 else n+1

【讨论】：

【解决方案9】：

试试这个：

def firstMissingPositive(A):
        try:
            return min(set(range(1, len(A)+1)) - set(A))
        except:
            return max(1, max(A)+1)

【讨论】：

【解决方案10】：

这是我的解决方案。

from collections import defaultdict

def firstMissingPositive(A):

    d = defaultdict(int)
    for i in A:
        d[i] = 1

    j = 1
    while True:
        if d[j] == 0:
            return j
        j += 1

空间复杂度： O(n)

时间复杂度： O(n + k)。这被认为是O（n）。也忽略了散列复杂度。

顺便说一句：Googling 给出了你所寻求的答案，恒定的空间和 O(n) 时间。

【讨论】：

mmm，如果你仔细想想，我们的解决方案很相似，不同的是我使用了一个集合和一些额外的检查，但核心思想是一样的......
另外，你不需要defaultdict，你可以用一个普通的dict和dict.fromkeys(A)来达到同样的效果，只是检查不同
@Copperfield 我用它来缩短代码。如果可能的话，我总是尽量做到这一点：D

【解决方案11】：

复杂度：O(N) 或 O(N * log(N))
如果您想测试亲切度，您将获得 100% 的分数
背后的想法是使用 1 个循环和一次字典来避免 O(N^2) 复杂度
问题中提到的列表所用时间：0:00:00.000098

def test(A):
    value = 1
    data = {}
    for num in A:
        if num < 0:
            continue
        elif num > 0 and num != value:
            data[num] = 1
            continue
        elif num == value:
            data[num] = 1
            while data.get(value, None) is not None:
                value += 1
    return value

【讨论】：