【问题标题】:Longest Consecutive Sequence in an Unsorted Array [duplicate]未排序数组中的最长连续序列
【发布时间】:2011-11-19 04:12:53
【问题描述】:

给你一个数字数组,它们是未排序的/随机顺序。您应该在数组中找到最长的连续数字序列。请注意,序列不需要在数组中按排序顺序排列。这是一个例子:

输入:

A[] = {10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101}  

输出是:

{16,17,18,19,20,21,22}  

解决方案需要 O(n) 复杂度。

我被告知该解决方案涉及使用哈希表,并且我确实遇到了一些使用 2 个哈希表的实现。无法排序和解决这个问题,因为排序需要 O(nlgn),这不是我们想要的。

【问题讨论】:

  • “最长的连续数字序列” - 这将是整个列表。
  • @dietrich - 不,这不是作业
  • @Thorbj0m - 这怎么可能?整个 iist 并不完全由以未排序/随机方式放置的连续数字组成,对吧?
  • 不是示例 {16, 17,18,19,20,21,22} 的输出
  • 基数排序只需要元素具有恒定大小即可得到 O(n),它们确实如此。

标签: c# python algorithm


【解决方案1】:

你可以有两张桌子:

  • 起始表:(起始点,长度)
  • 结束表:(结束点,长度)

添加新项目时,您会检查:

  • 值 + 1 是否存在于起始表中?如果是这样,请将其删除并创建一个 (value, length + 1) 的新条目,其中 length 是“当前”长度。您还可以使用相同的端点但更长的长度更新结束表。
  • 值 - 1 是否存在于最终表中?如果是,则删除它并创建一个新的条目(值,长度+1),这次更新起始表(起始位置相同,但长度会增加)

如果两个条件都成立,那么您实际上是将两个现有序列拼接在一起 - 将四个现有条目替换为两个新条目,表示单个较长的序列。

如果都不满足条件,您只需在两个表中创建一个长度为 1 的新条目。

添加完所有值后,您只需遍历起始表即可找到具有最大值的键。

认为这可行,如果我们假设 O(1) 哈希查找/添加/删除,这将是 O(n)。

编辑:C# 实现。花了一点时间才弄好,但我认为它有效:)

using System;
using System.Collections.Generic;

class Test
{
    static void Main(string[] args)
    {
        int[] input = {10,21,45,22,7,2,67,19,13,45,12,
                11,18,16,17,100,201,20,101};

        Dictionary<int, int> starts = new Dictionary<int, int>();
        Dictionary<int, int> ends = new Dictionary<int, int>();

        foreach (var value in input)
        {
            int startLength;
            int endLength;
            bool extendsStart = starts.TryGetValue(value + 1,
                                                   out startLength);
            bool extendsEnd = ends.TryGetValue(value - 1,
                                               out endLength);

            // Stitch together two sequences
            if (extendsStart && extendsEnd)
            {
                ends.Remove(value + 1);
                starts.Remove(value - 1);
                int start = value - endLength;
                int newLength = startLength + endLength + 1;
                starts[start] = newLength;
                ends[start + newLength - 1] = newLength;
            }
            // Value just comes before an existing sequence
            else if (extendsStart)
            {
                int newLength = startLength + 1;
                starts[value] = newLength;
                ends[value + newLength - 1] = newLength;
                starts.Remove(value + 1);
            }
            else if (extendsEnd)
            {
                int newLength = endLength + 1;
                starts[value - newLength + 1] = newLength;
                ends[value] = newLength;
                ends.Remove(value - 1);
            }
            else
            {
                starts[value] = 1;
                ends[value] = 1;
            }
        }

        // Just for diagnostics - could actually pick the longest
        // in O(n)
        foreach (var sequence in starts)
        {
            Console.WriteLine("Start: {0}; Length: {1}",
                              sequence.Key, sequence.Value);
        }
    }
}

编辑:这也是用 C# 实现的单哈希集答案 - 我同意,它比上面的更简单,但我将原始答案留给后人:

using System;
using System.Collections.Generic;
using System.Linq;

class Test
{
    static void Main(string[] args)
    {
        int[] input = {10,21,45,22,7,2,67,19,13,45,12,
                11,18,16,17,100,201,20,101};

        HashSet<int> values = new HashSet<int>(input);

        int bestLength = 0;
        int bestStart = 0;
        // Can't use foreach as we're modifying it in-place
        while (values.Count > 0)
        {
            int value = values.First();
            values.Remove(value);
            int start = value;
            while (values.Remove(start - 1))
            {
                start--;
            }
            int end = value;
            while (values.Remove(end + 1))
            {
                end++;
            }

            int length = end - start + 1;
            if (length > bestLength)
            {
                bestLength = length;
                bestStart = start;
            }
        }
        Console.WriteLine("Best sequence starts at {0}; length {1}",
                          bestStart, bestLength);
    }
}

【讨论】:

  • 谢谢@Jon!但是我有一个疑问,如果我们希望哈希查找的复杂度为 O(1),这是否意味着我们的哈希桶会消耗内存?假设数组中有一百万个数字,我们需要对其执行此算法。
  • @Anoop:它可能仍会摊销 O(1),但确实会占用大量内存。我承认哈希表实现的细节有点超出我的能力,但只要没有哈希 collisions 我相信它应该没问题。当然,我可能弄错了。我怀疑你会找到任何基于散列表的解决方案,依赖于 O(1) 散列操作。
  • 这个算法不会为问题中给出的数组输出 {start 16, length 3} 吗? ...这也是我真正理解这个问题的方式。但似乎连续列表中允许存在不属于连续列表的元素。
  • @DaveBall:为什么它没有发现 20 出现在最长的序列中?在达到 20 之前,它将具有 { start=16, length=4 } 和 { end = 19, length = 4 },以及 { start = 21, length = 2 } 和 { end = 22, length = 2 }。然后它会注意到有 20 个将这些序列拼接在一起。
  • 我在答案中发布了带有工作代码的解决方案。
【解决方案2】:

将所有内容转储到哈希集。

现在遍历哈希集。对于每个元素,查找与当前值相邻的所有值的集合。跟踪您可以找到的最大序列,同时从集合中删除找到的元素。保存计数以进行比较。

重复此操作,直到哈希集为空。

假设查找、插入和删除是 O(1) 时间,那么这个算法将是 O(N) 时间。

伪代码:

 int start, end, max
 int temp_start, temp_end, count

 hashset numbers

 for element in array:
     numbers.add(element)

 while !numbers.empty():
     number = numbers[0]
     count = 1
     temp_start, temp_end = number 

     while numbers.contains(number - 1):
         temp_start = number - 1; count++
         numbers.remove(number - 1)

     while numbers.contains(number + 1):
         temp_end = number + 1; count++
         numbers.remove(number + 1)

     if max < count:
         max = count
         start = temp_start; end = temp_end

 max_range = range(start, end)

嵌套的 while 看起来并不漂亮,但每个数字只能使用一次,所以应该是 O(N)。

【讨论】:

  • 非常整洁,除了我遗漏的一部分。如果我一直删除序列中的“n-1”元素,我将如何找到序列的开头,因为我已经在哈希中找到了“n”?
  • @Anoop 我添加了伪代码,希望现在更清楚了
  • 哈希的第一个加法是 O(n)。现在,对于散列中的每个数字,您将在递减方向上迭代“x”次,在递增方向上迭代“y”次。在您的解决方案中,一段时间内几乎是不可避免的,使其 > O(n) 对吗?尽管是常数 k,其中 k 与原始集合中存在的连续元素的数量成正比。然而,Jon 的解决方案“k”似乎受到“开始”和“结束”表操作评估为真这一事实的限制。
  • 您需要更新擦除循环中的数字。
  • @renaud - count++ 在循环内完成,如果这是你所指的
【解决方案3】:

这是 Python 中的一个解决方案,它只使用一个哈希集,不做任何花哨的区间合并。

def destruct_directed_run(num_set, start, direction):
  while start in num_set:
    num_set.remove(start)
    start += direction
  return start

def destruct_single_run(num_set):
  arbitrary_member = iter(num_set).next()
  bottom = destruct_directed_run(num_set, arbitrary_member, -1) 
  top = destruct_directed_run(num_set, arbitrary_member + 1, 1)
  return range(bottom + 1, top)

def max_run(data_set):
  nums = set(data_set)
  best_run = []
  while nums:
    cur_run = destruct_single_run(nums)
    if len(cur_run) > len(best_run):
      best_run = cur_run
  return best_run

def test_max_run(data_set, expected):
  actual = max_run(data_set)
  print data_set, actual, expected, 'Pass' if expected == actual else 'Fail'

print test_max_run([10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101], range(16, 23))
print test_max_run([1,2,3], range(1, 4))
print max_run([1,3,5]), 'any singleton output fine'

【讨论】:

  • 我在理解 Python 代码和您选择传递的 range() 参数时遇到了一点问题。
  • @AnoopMenon test_max_run 是一个单元测试,范围是 max_run 应该为 data_set 返回的预期范围。
【解决方案4】:

另一种解决方案是在 O(n) 中运行的哈希搜索

int maxCount = 0;
for (i = 0; i<N; i++) 
{ 
    // Search whether a[i] - 1 is present in the list.If it is present, 
    // you don't need to initiate count since it  will be counted when 
    // (a[i] - 1) is traversed.
    if (hash_search(a[i]-1))
        continue;

    // Now keep checking if a[i]++ is present in the list, increment the count
    num = a[i]; 
    while (hash_search(++num)) 
        count++;

    // Now check if this count is greater than the max_count got previously 
    // and update if it is
    if (count > maxCount)
    {
        maxIndex = i;
        count = maxCount;
    }
}

【讨论】:

    【解决方案5】:

    这里是实现:

    static int[] F(int[] A)
    {
        Dictionary<int, int> low = new Dictionary<int, int>();
        Dictionary<int, int> high = new Dictionary<int, int>();
    
        foreach (int a in A)
        {
            int lowLength, highLength;
    
            bool lowIn = low.TryGetValue(a + 1, out lowLength);
            bool highIn = high.TryGetValue(a - 1, out highLength);
    
            if (lowIn)
            {
                if (highIn)
                {
                    low.Remove(a + 1);
                    high.Remove(a - 1);
                    low[a - highLength] = high[a + lowLength] = lowLength + highLength + 1;
                }
                else
                {
                    low.Remove(a + 1);
                    low[a] = high[a + lowLength] = lowLength + 1;
                }
            }
            else
            {
                if (highIn)
                {
                    high.Remove(a - 1);
                    high[a] = low[a - highLength] = highLength + 1;
                }
                else
                {
                    high[a] = low[a] = 1;
                }
            }
        }
    
        int maxLow = 0, maxLength = 0;
        foreach (var pair in low)
        {
            if (pair.Value > maxLength)
            {
                maxLength = pair.Value;
                maxLow = pair.Key;
            }
        }
    
        int[] ret = new int[maxLength];
        for (int i = 0; i < maxLength; i++)
        {
            ret[i] = maxLow + i;
        }
    
        return ret;
    }
    

    【讨论】:

    • 谢谢!不错,但是使用单个哈希桶的解决方案相对更好:) 查看 Jon 和 rrenaud 的解决方案。再次感谢彼得
    【解决方案6】:
    class Solution {
    public:
        struct Node{
            int lower;
            int higher;
            Node(int l, int h):lower(l),higher(h){
    
        }
    };
    int longestConsecutive(vector<int> &num) {
        // Start typing your C/C++ solution below
        // DO NOT write int main() function
    
        map<int,Node> interval_map;
        map<int,Node>::iterator curr_iter,inc_iter,des_iter;
    
        //first collect
        int curr = 0;
        int max = -1;
        for(size_t i = 0; i < num.size(); i++){
            curr = num[i];
            curr_iter = interval_map.find(curr);
            if (curr_iter == interval_map.end()){
                interval_map.insert(make_pair(curr,Node(curr,curr)));
            }
        } 
        //the next collect    
        for(curr_iter = interval_map.begin(); curr_iter != interval_map.end(); curr_iter++)
        {
            int lower = curr_iter->second.lower;
            int higher = curr_iter->second.higher;
            int newlower = lower, newhigher = higher;
    
            des_iter = interval_map.find(lower - 1);
            if (des_iter != interval_map.end())
            {
                curr_iter->second.lower = des_iter->second.lower;
                newlower = des_iter->second.lower;
            }
    
            inc_iter = interval_map.find(higher + 1);
            if (inc_iter != interval_map.end()){
                curr_iter->second.higher = inc_iter->second.higher;
                newhigher = inc_iter->second.higher;
            }
    
            if (des_iter != interval_map.end()){
                des_iter->second.higher = newhigher;
            }
            if (inc_iter != interval_map.end()){
                inc_iter->second.lower = newlower;
            }
            if (curr_iter->second.higher - curr_iter->second.lower + 1> max){
                 max = curr_iter->second.higher - curr_iter->second.lower + 1;
             }
        }   
        return max;
    }
    };
    

    【讨论】:

    • 感谢您发布答案!虽然代码 sn-p 可以回答这个问题,但添加一些附加信息仍然很棒,比如解释等..
    【解决方案7】:

    这是来自此问题副本的 Grigor Gevorgyan 的解决方案,但我认为简化了:

    data = [1,3,5,7,4,6,10,3]
    
    # other_sides[x] == other end of interval starting at x
    # unknown values for any point not the end of an interval
    other_sides = {}
    # set eliminates duplicates, and is assumed to be an O(n) operation
    for element in set(data):
        # my intervals left hand side will be the left hand side
        # of an interval ending just before this element
        try:
            left = other_sides[element - 1]
        except KeyError:
            left = element
    
        # my intervals right hand side will be the right hand side
        # of the interval starting just after me
        try:
            right = other_sides[element + 1]
        except KeyError:
            right = element
    
        # satisfy the invariants
        other_sides[left] = right
        other_sides[right] = left
    
    # convert the dictionary to start, stop segments
    # each segment is recorded twice, so only keep half of them
    segments = [(start, stop) for start, stop in other_sides.items() if start <= stop]
    # find the longest one
    print max(segments, key = lambda segment: segment[1] - segment[0])
    

    【讨论】:

      【解决方案8】:

      这是由Grigor Gevorgyan 回答的类似问题的python 代码,我认为这是解决该问题的非常优雅的解决方案

      l = [10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101]
      d = {x:None for x in l}
      print d
      for (k, v) in d.iteritems():
          if v is not None: continue
          a, b = d.get(k - 1), d.get(k + 1)
          if a is not None and b is not None: d[k], d[a], d[b] = k, b, a
          elif a is not None: d[a], d[k] = k, a
          elif b is not None: d[b], d[k] = k, b
          else: d[k] = k
          print d
      
      m = max(d, key=lambda x: d[x] - x)
      print m, d[m]
      

      输出:

      {2: 2, 67: None, 100: None, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: None, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 100, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 10, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 11, 11: 10, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 12, 11: 10, 12: 10, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 12, 11: 10, 12: 10, 45: 45, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 16, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 17, 17: 16, 18: None, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 18, 17: 16, 18: 16, 19: None, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 19, 17: 16, 18: 16, 19: 16, 20: None, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 20, 17: 16, 18: 16, 19: 16, 20: 16, 21: None, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 21, 17: 16, 18: 16, 19: 16, 20: 16, 21: 16, 22: None}
      {2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 22, 17: 16, 18: 16, 19: 16, 20: 16, 21: 16, 22: 16}
      16 22
      

      【讨论】:

        猜你喜欢
        • 2020-09-19
        • 2019-09-15
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-06-28
        相关资源
        最近更新 更多