查找 N 个唯一字符的最长子串答案

【问题标题】：Find longest substring of N unique characters查找 N 个唯一字符的最长子串
【发布时间】：2014-02-02 20:20:50
【问题描述】：

输入：str="abcdeefuiuiwiwwaaaa" n=3 输出：“iwiwwaaaa”（最长的 substr，3 个 diff 字符）

我有一个解决方案如下。我的问题：

时间复杂度如何？我知道它一定比 O(n^2) 好，但不确定是否可以断定它是 O(n)。

下面的解决方案不能覆盖整个ASCII，我们可以在没有额外空间的情况下改进它吗？

public static String getSubstrOfMChars(String str, int m) 
{
     if (str==null || str.length()==0)
         return "";     

     int len = str.length();        
     String max = "";

     for(int i=0; i<len;) 
     {  
         StringBuilder sb = new StringBuilder();
         int counter = 1;
         int checker = 0;
         char firstChar = str.charAt(i);
         int firstCharPos = i;    // first char position in the string
         sb.append(firstChar);
         checker |= 1 << (firstChar - 'a');

         for(int j=i+1; j<len; j++) 
         {  
             char currChar = str.charAt(j);
             if (currChar == firstChar) 
                 firstCharPos++;                

             int tester = checker & (1<<(currChar - 'a'));
             if ( tester > 0 ) // already have such character
             {
                 sb.append(currChar);
                 continue;
             }

             // new character
             if (++counter > m) 
             {
                i = firstCharPos + 1;

                if (sb.length() > max.length()) 
                {
                    max = sb.toString();
                }
                break;
             }
             sb.append(currChar);                   
             checker |= 1 << (currChar - 'a');              
        }

        if (counter <= m) 
        {               
            if ((counter==m) && sb.length() > max.length()) 
            {
                max = sb.toString();
            }               
            break;
        }

     }

     return max;        
}

【问题讨论】：

您的算法最多可以容纳 32 个值进行检查，因为您使用的是 ints。我会仔细检查声明I know it must be better than O(n^2)...。如需帮助，请查看 1+2+3...+N 的总和。
提示：想想当你遍历第一个 for 循环时 j 的值是如何变化的。

标签： algorithm

【解决方案1】：

有一个 O(n)。让S 成为字符串。只需使用两个指针i 和j 遍历数组，并跟踪S[i] 和S[j] 之间不同字母的数字K。每当此数字小于或等于 n 时增加 j 并在 K 大于 n 时增加 i。还要记住K 等于n 的最长子字符串。

在实现中，您还需要一个哈希表来跟踪最后一次出现的字母。

Python 实现：

def longest(S,n):
    i = j = K = 0
    res = (0,0)
    last = {}

    while i < len(S):
        # if current substring is better than others than save
        if K == n and j - i > res[1] - res[0]:
            res = (i,j)

        # if we reach the end of the string, we're done.
        if j + 1 > len(S):
            break
        # if we can go further with the right end than do it
        elif K <= n and j + 1 <= len(S):
            if not last.has_key(S[j]):
                K = K + 1
            last[S[j]] = j
            j = j + 1
        # if we must go further with the left end than do it
        else:
            if last.has_key(S[i]):
                del last[S[i]]
                K = K - 1
            i = i + 1
    return S[res[0]:res[1]]

【讨论】：

我认为在分配 res=(i,j) 时需要检查 K == n 而不是 K
好的，谢谢！我忘了我们需要n 个字符。幸运的是，只需更改一个字符；）
有一个 off by 1 错误。当前代码没有在索引 0 处推送字符。您需要将条件更改为 j+1 <= len(S) 并将 j = j + 1 交换为 last[S[j]] = j 之后
if last[S[i]] == i: del last[S[i]] ，我怀疑这是否能得到正确的结果。
有一个错误。如果你这样做：print longest("aabbcdeeeeggi", 3)，则输出为bcdeeee，这是不正确的。要修复它，您可以这样做：if last.has_key(S[i]): key = S[i]; K = K - 1; del last[S[i]]; while S[i] == key: i = i + 1

【解决方案2】：

您目前的代码复杂度为 O(N^2)，因为您使用嵌套的 for 循环来检查从每个字符开始的子字符串。

IMO 您可以在 O(N*k) 时间和 O(k) 额外空间（其中 k = 允许的唯一字符数）内完成此操作：

从头开始迭代字符串并将值映射中的第一个字符添加到找到的最后一个位置。
继续解析字符串并更新在地图中找到的每个字符的最后位置。
当您获得一个新字符时，增加字符数并使为该字符找到的最后位置 = 当前位置。
当map中的计数达到k时，迭代map并搜索最小位置索引的值。计算present position - min(last position index) 并相应地更新最大长度子字符串。递减计数。从地图中弹出这个角色。
继续上述操作，直到到达字符串的末尾。

【讨论】：

【解决方案3】：

所有的答案都太复杂了。我会抛出一个简单的解决方案..

问题的约束围绕着不同的字符。

-所以，我们的解决方案应该随时优先考虑UNIQUE字符数（unicount）。

-有两种情况需要考虑。一个是 unicount =K。

CASE 1: (unicount<K)
    1a: Str[i] is a new character not present already in the current window.
         --Increase unicount and hash[str[i]]
    1b: Str[i] is a not  new character present already in the current window.
        --No need to  increase unicount. Just hash str[i].

CASE 2: (unicount>=K)
    2a.  Str[i] is a not  new character present already in the current window.
        --No need to do anything cause unicount will be equal to K. Just hash str[i].
    2b. Slide the window (VARIABLE start) till the unicount value decreases..
         --Now similar to case 1.

下面的代码打印只有 K 个不同字符的此类子字符串的最长长度。很容易修改它以实际打印这样的子字符串。

int printLengthKUniqueSubstring(string str,int k)
{
    int hash[256] = {0};

    int n = str.length();
    int unicount = 0,maxlength = 0,start = 0;
    for(int i=0;i<n;i++)
    {
        if(unicount<k)
        {
            if(hash[str[i]]==0)
            {
                hash[str[i]]++;
                unicount++;
            }
            else
                hash[str[i]]++;
        }
        else
        {
           // cout<<"hello "<<" "<<unicount<<" "<<i<<endl;
            if(hash[str[i]]>0)
                hash[str[i]]++;
            else
            {
                while(unicount>=k)
                {
                    hash[str[start]]--;
                    if(hash[str[start]]==0)
                        unicount--;
                    start++;
                }
                if(hash[str[i]]==0)
                {
                    hash[str[i]]++;
                    unicount++;
                }
                else
                    hash[str[i]]++;
            }

        }
        maxlength = max(maxlength,i-start+1);
    }
    if(unicount<k)
        return -1;
    return maxlength;
}

祝你有美好的一天！

【讨论】：

【解决方案4】：

复杂性O(n*C) 其中 C 是一个常数，用于检查字典的最小值。

这是 C# 中的解决方案。

public static string GetLongestSubString(string s, int numberOfUniqueChar)
{
    char c;
    int start = 0;
    string result = string.Empty, temp = string.Empty;
    Dictionary<Char, int> dic = new Dictionary<char, int>();

    for (int i = 0; i < s.Length; i++)
    {
        if (!dic.ContainsKey(s[i]))
        {
            dic.Add(s[i], i);
            if (dic.Count > numberOfUniqueChar)
            {
                temp = s.Substring(start, (i - start));
                if (temp.Length > result.Length)
                {
                    result = temp;
                }
                c = dic.OrderBy(k => k.Value).FirstOrDefault().Key;
                start = dic[c]+1;
                dic.Remove(c);
            }
        }
        else
        {
            // increase index of the current key
            dic[s[i]] = i;

            //if last char not change then check current substring with the result
            if(i==s.Length-1){
                temp = s.Substring(start);
                if (temp.Length > result.Length)
                {
                    result = temp;
                }
            }
        }
    }

    return result;
}

【讨论】：

【解决方案5】：

以下是我解决这个问题的方法。首先，它将字符串分成相同字符的组；然后循环检索所有有效的子串； finally 返回所有可能的最长子字符串：

import re
def longest(S,n):
    # 1. groupby unique characters
    grp_S =  [ s[0] for s in re.findall(r'(([a-z])\2*)', S)]
    # 2. retrieve all valid combinations in tuples (characters count, substring)
    options = []
    for i in xrange(len(grp_S)):
        g = 0
        while i  + n  + g <= len(grp_S):
            if  (len(set( [x[0] for x in grp_S [i: i + n + g]])) == n and i  + n  + g  + 1 > len(grp_S)) or \
                (len(set( [x[0] for x in grp_S [i: i + n + g]])) == n and len(set( [x[0] for x in grp_S [i: i + n + g + 1]])) > n):
                options.append( (len(''.join(grp_S [i: i + n + g])), ''.join(grp_S [i: i + n + g])) )
                break
            else: g = g + 1
    # 3. return the list of all longest substrings
    return [ v[1] for v in options if v[0] == max(options)[0] ]

【讨论】：

【解决方案6】：

简单，在短 Python 语法中没有错误检查

【讨论】：

您可以简单地发布代码而不是屏幕截图。看看你的名声，我想你现在应该知道了。
我这样做是为了阻止一些人（一个新手）在不写的情况下复制粘贴代码而没有错误检查，就像我正在采访的人一样测试自己！请给我一个为什么需要复制/粘贴代码的原因？你是学生吗？
认为屏幕截图看起来很漂亮，并且显示在谷歌图像搜索中，还可以通过一键从我的 IDE 完全自动发布，最重要的是让我在上传之前练习代码。但是对于带有解释的精炼代码，我会照你说的做，请在投票前给出充分的理由，以便我理解为您提供帮助
能够复制和粘贴正是我避免发布屏幕截图的原因，因为这样可以让某人快速测试并优化您的实现。我了解您担心学生可能会复制和粘贴，但并非 SO 上的每个人都是学生，大多数开发人员都在寻找快速的初始解决方案（如果不是最好的）。除此之外，我认为答案也应该伴随一些解释，因为它不会在搜索结果中出现更高的一行。如果没有解释，代码应该包含一些 cmets（这对学生也有好处:)）。