在没有额外空间的 N 个排序数组中查找公共元素答案

【问题标题】：Find common elements in N sorted arrays with no extra space在没有额外空间的 N 个排序数组中查找公共元素
【发布时间】：2013-02-23 03:44:14
【问题描述】：

给定 N 个大小为 N 的数组，并且它们都是已排序的，如果它不允许您使用额外的空间，如何有效地或以更少的时间复杂度找到它们的共同数据？

例如：

 1. 10 160 200 500 500
 2. 4 150 160 170 500
 3. 2 160 200 202 203
 4. 3 150 155 160 300
 5. 3 150 155 160 301

这是一个面试问题，我发现了一些类似的问题，但它们不包括输入被排序或无法使用额外内存的额外条件。

我想不出任何低于 O(n^2 lg n) 复杂度的解决方案。在这种情况下，我宁愿选择最简单的解决方案，它会给我带来这种复杂性，即：

  not_found_flag = false

  for each element 'v' in row-1
       for each row 'i' in the remaining set
           perform binary search for 'v' in 'i'
           if 'v' not found in row 'i'
                 not_found_flag = true
                 break
       if not not_found_flag 
           print element 'v' as it is one of the common element

我们可以通过比较每行的最小值和最大值来改进这一点，并据此决定数字“num”是否可能介于该行的“min_num”和“max_num”之间。

二分查找 -> O(log n) 在 n-1 行中搜索 1 num：O(nlogn) 对第一行中的每个数字进行二进制搜索：O(n2logn)

我选择了第一行，我们可以选择任何行，如果在任何 (N-1) 行中找到所选行的任何元素，那么我们实际上没有公共数据。

【问题讨论】：

您需要“一些”额外空间来存储（可能的）常见元素...
@MitchWheat。请看上面的伪代码。如果我们只打印常见元素就可以了，我们真的需要额外的存储空间吗？
你真的是通过二分搜索来保存任何东西吗？既然你需要找到所有常见的元素，为什么不直接扫描排序后的数组并在 O(n) 中完成
@smk。有“N”个不同的数组，每个数组都是独立排序的。我已经用示例编辑了这个问题。通过扫描“N”个数组，我们无法在 O(n) 中找到公共元素。它更像是一个 NXN 方阵，其中每一行都是单独排序的。

标签： algorithm data-structures

【解决方案1】：

这似乎可以在O(n^2) 中完成；即，只查看每个元素一次。请注意，如果一个元素对所有数组都是通用的，那么它必须存在于其中任何一个数组中。同样出于说明目的（并且由于您使用了上面的 for 循环），我假设我们可以为每个数组保留一个索引，但我稍后会讨论如何解决这个问题。

让我们调用数组 A_1 到 A_N，并使用从 1 开始的索引。伪代码：

# Initial index values set to first element of each array
for i = 1 to N:
  x_i = 1 

for x_1 = 1 to N:
  val = A_1[x_1] 
  print_val = true
  for i = 2 to N:
    while A_i[x_i] < val:
      x_i = x_i + 1
    if A_i[x_i] != val:
      print_val = false
  if print_val:
    print val

算法说明。我们使用第一个数组（或任意数组）作为参考算法，并行遍历所有其他数组（有点像归并排序的归并步骤, N 个数组除外。）所有数组共有的引用数组的每个值都必须存在于所有其他数组中。因此，对于每个其他数组（因为它们已排序），我们增加索引 x_i 直到该索引 A_i[x_i] 处的值至少是我们正在寻找的值（我们不关心较小的值；他们可以' t 是常见的。）我们可以这样做，因为数组是排序的，因此是单调非递减的。如果所有数组都有这个值，那么我们打印它，否则我们在引用数组中增加x_1 并继续。即使我们不打印值，我们也必须这样做。

到最后，我们打印了所有数组共有的所有值，而每个元素只检查了一次。

绕过额外的存储需求。有很多方法可以做到这一点，但我认为最简单的方法是检查每个数组的第一个元素并将最大值作为参考数组@ 987654328@。如果它们都相同，则打印该值，然后将索引 x_2 ... x_N 存储为每个数组本身的第一个元素。

Java 实现（为简洁起见，没有额外的 hack），使用您的示例输入：

public static void main(String[] args) {
    int[][] a = {
            { 10, 160, 200, 500, 500, },
            { 4, 150, 160, 170, 500, },
            { 2, 160, 200, 202, 203, },
            { 3, 150, 155, 160, 300 },
            { 3, 150, 155, 160, 301 } };

    int n = a.length;
    int[] x = new int[n];

    for( ; x[0] < n; x[0]++ ) {
        int val = a[0][x[0]]; 
        boolean print = true;
        for( int i = 1; i < n; i++ ) {
            while (a[i][x[i]] < val && x[i] < n-1) x[i]++;              
            if (a[i][x[i]] != val) print = false;               
        }   
        if (print) System.out.println(val);
    }   
}

输出：

【讨论】：

【解决方案2】：

这是 python O(n^2) 中的一个解决方案，不使用额外空间但会破坏列表：

def find_common(lists):
    num_lists = len(lists)
    first_list = lists[0]
    for j in first_list[::-1]:
        common_found = True
        for i in range(1,num_lists):
            curr_list = lists[i]
            while curr_list[len(curr_list)-1] > j:
                curr_list.pop()
            if curr_list[len(curr_list)-1] != j:
                common_found = False
                break
        if common_found:
            return j

【讨论】：

【解决方案3】：

一个不使用额外存储，但修改原始数组的 O(n^2) (Python) 版本。允许存储公共元素而不打印它们：

data = [
    [10, 160, 200, 500, 500],
    [4, 150, 160, 170, 500],
    [2, 160, 200, 202, 203],
    [3, 150, 155, 160, 300],
    [3, 150, 155, 160, 301],
]

for k in xrange(len(data)-1):
    A, B = data[k], data[k+1]
    i, j, x = 0, 0, None

    while i<len(A) or j<len(B):
        while i<len(A) and (j>=len(B) or A[i] < B[j]):
            A[i] = x
            i += 1

        while j<len(B) and (i>=len(A) or B[j] < A[i]):
            B[j] = x
            j += 1

        if i<len(A) and j<len(B):
            x = A[i]
            i += 1
            j += 1

print data[-1]

我所做的基本上是获取数据中的每个数组，然后逐个元素地与下一个数组进行比较，删除那些不常见的。

【讨论】：

【解决方案4】：

这是Java实现

public static Integer[] commonElementsInNSortedArrays(int[][] arrays) {
    int baseIndex = 0, currentIndex = 0, totalMatchFound= 0;
    int[] indices = new int[arrays.length - 1];
    boolean smallestArrayTraversed = false;
    List<Integer> result = new ArrayList<Integer>();
    while (!smallestArrayTraversed && baseIndex < arrays[0].length) {
        totalMatchFound = 0;
        for (int array = 1; array < arrays.length; array++) {
            currentIndex = indices[array - 1];
            while (currentIndex < arrays[array].length && arrays[array][currentIndex] < arrays[0][baseIndex]) {
                currentIndex ++;                    
            }

            if (currentIndex < arrays[array].length) {
                if (arrays[array][currentIndex] == arrays[0][baseIndex]) {
                    totalMatchFound++;
                }
            } else {
                smallestArrayTraversed = true;
            }
            indices[array - 1] = currentIndex;
        }
        if (totalMatchFound == arrays.length - 1) {
            result.add(arrays[0][baseIndex]);
        }
        baseIndex++;
    }

    return result.toArray(new Integer[0]);
}

这里是单元测试

@Test
public void commonElementsInNSortedArrayTest() {
    int arr[][] = { {1, 5, 10, 20, 40, 80},
                    {6, 7, 20, 80, 100},
                    {3, 4, 15, 20, 30, 70, 80, 120}
                   };

    Integer result[] = ArrayUtils.commonElementsInNSortedArrays(arr);
    assertThat(result, equalTo(new Integer[]{20, 80}));

    arr = new int[][]{
            {23, 34, 67, 89, 123, 566, 1000},
            {11, 22, 23, 24,33, 37, 185, 566, 987, 1223, 1234},
            {23, 43, 67, 98, 566, 678},
            {1, 4, 5, 23, 34, 76, 87, 132, 566, 665},
            {1, 2, 3, 23, 24, 344, 566}
          };

    result = ArrayUtils.commonElementsInNSortedArrays(arr);
    assertThat(result, equalTo(new Integer[]{23, 566}));
}

【讨论】：

【解决方案5】：

这个 Swift 解决方案复制了原始版本，但可以修改为采用 inout 参数，这样它就不会占用额外的空间。我将其保留为副本，因为我认为最好不要修改原件，因为它会删除元素。通过保留索引可以不删除元素，但此算法删除元素以跟踪它的位置。这是一种功能性方法，可能不是超级有效但有效。由于它是功能性的，因此不需要条件逻辑。我发布它是因为我认为这可能是一种不同的方法，可能对其他人来说很有趣，也许其他人可以想办法提高它的效率。

func findCommonInSortedArrays(arr: [[Int]]) -> [Int] {
    var copy = arr
    var result: [Int] = []

    while (true) {

        // get first elements
        let m = copy.indices.compactMap { copy[$0].first }

        // find max value of those elements.
        let mm = m.reduce (0) { max($0, $1) }

        // find the value in other arrays or nil
        let ii = copy.indices.map { copy[$0].firstIndex { $0 == mm } }

        // if nil present in of one of the arrays found, return result
        if (ii.map { $0 }).count != (ii.compactMap { $0 }.count) { return result }

        // remove elements that don't match target value.
        copy.indices.map { copy[$0].removeFirst( ii[$0] ?? 0 ) }

        // add to list of matching values.
        result += [mm]

        // remove the matched element from all arrays
        copy.indices.forEach { copy[$0].removeFirst() }
    }
}

findCommonInSortedArrays(arr: [[9, 10, 12, 13, 14, 29],
                         [3, 5, 9, 10, 13, 14],
                         [3, 9, 10, 14]]
)

findCommonInSortedArrays(arr: [[],
                         [],
                         []]
)

findCommonInSortedArrays(arr: [[9, 10, 12, 13, 14, 29],
                         [3, 5, 9, 10, 13, 14],
                         [3, 9, 10, 14],
                         [9, 10, 29]]
)

findCommonInSortedArrays(arr: [[9, 10, 12, 13, 14, 29],
                               [3, 5, 9, 10, 13, 14],
                               [3, 9, 10, 14],
                               [9, 10, 29]]
)

【讨论】：