【问题标题】:Algorithm to find the minimum length of substring having all characters of other string找到具有其他字符串所有字符的子字符串的最小长度的算法
【发布时间】:2014-05-30 10:06:21
【问题描述】:

我有两个字符串:
string1 - hello how are you,
String2 - olo(包括空格)

输出:lo ho(hello ho你是)

lo ho 是唯一包含 string2 的所有字符的子字符串。 任何人都可以为此建议一个好的算法(我只能认为 og 蛮力算法 - O(n^2)。

还应该输出最小长度的字符串(如果有多个选项)。

【问题讨论】:

  • 蛮力不是O(n^2),而是O(n^3) - 检查每个子字符串本身就是O(n),其中有O(n^2)。除非你有别的想法?
  • 输出应该像原始问题一样包含空格还是不包含空格?
  • 我不明白你的问题...lo ho如何验证这两个字符串,因为String1和输出都不包含olo
  • 如果我们考虑单个字符,那么输出不止一个......
  • @NaveedButt 字符串“lo ho”是string1 的最小子字符串,其多字符集是string2 字符多集的超集。问题中的语句“lo ho is the only substring that contains all characters of string2”一定是错误的,否则对我来说也没有任何意义。

标签: algorithm


【解决方案1】:

保留两个指针lr,以及一个哈希表M = character -> count 用于string2 中不出现在s[l..r] 中的字符。

初始设置l = 0r 以便string1[l..r] 包含string2 的所有字符(如果可能)。你可以通过从 M 中删除字符直到它为空来做到这一点。

然后在每个步骤中将r 加一,然后在保持 M 为空的同时尽可能地增加 l。所有r - l + 1(子字符串s[l..r]的长度)的最小值就是解决方案。

Pythonish 伪代码:

n = len(string1)
M = {}   # let's say M is empty if it contains no positive values
for c in string2:
    M[c]++
l = 0
r = -1
while r + 1 < n and M not empty:
    r++
    M[string1[r]]--
if M not empty: 
    return "no solution"
answer_l, answer_r = l, r
while True:
    while M[string1[l]] < 0:
        M[string1[l]]++
        l++
    if r - l + 1 < answer_r - anwer_l + 1:
        answer_l, answer_r = l, r
    r++
    if r == n:
        break
    M[string1[r]]--
return s[answer_l..answer_r]

如果在执行递增和递减操作时保持正条目的数量,则可以在 O(1) 中实现“为空”检查。

nstring1 的长度,mstring2 的长度。

请注意,lr 只会递增,因此最多有 O(n) 次递增,因此在最后一个外循环中最多执行 O(n) 条指令。

如果M 被实现为一个数组(我假设字母是恒定大小的),你会得到运行时 O(n + m),这是最优的。如果字母表太大,您可以使用哈希表来获得预期的 O(n + m)。

示例执行:

string1 = "abbabcdbcb"
string2 = "cbb"

# after first loop
M = { 'a': 0, 'b': 2, 'c': 1, 'd': 0 }

# after second loop
l = 0
r = 5
M = { 'a': -2, 'b': -1, 'c': 0, 'd': 0 }

# increment l as much as possible:
l = 2
r = 5
M = { 'a': -1, 'b': 0, 'c': 0, 'd': 0 }

# increment r by one and then l as much as possible
l = 2
r = 6
M = { 'a': -1, 'b': 0, 'c': 0, 'd': -1 }

# increment r by one and then l as much as possible
l = 4
r = 7
M = { 'a': 0, 'b': 0, 'c': 0, 'd': -1 }

# increment r by one and then l as much as possible
l = 4
r = 8
M = { 'a': 0, 'b': 0, 'c': -1, 'd': -1 }

# increment r by one and then l as much as possible
l = 7
r = 9
M = { 'a': 0, 'b': 0, 'c': 0, 'd': 0 }

最好的解决方案是 s[7..9]。

【讨论】:

  • wait.. 起初我因为过于简化而放弃了伪代码,但现在我看不到如何逃避 O(n²) 最坏情况以使其返回正确的解决方案?例如对于OP的数据,你不能停在llho wo,所以while M[string1[l]] &lt; 0是不够的,它需要继续到最后的l。另一个例子s1='abacccccbc',s2='bc'不能停在bac它需要一直到bc,这将使算法二次(即使有很好的常数因子,它可能足够快,甚至对于大多数情况下的数百万个字符).. 还是我错过了什么?
  • @deathApril l 和 r 都只会递增,不会递减,因此循环迭代的总数为 O(n)。我没有说你应该在某个地方停下来,只是尝试所有可能的r,所以在你的第二个例子中,你在某个时候到达s[l..r] = bc。诀窍是与增加r 相关的ls 是单调增加的。
  • 我明白了,谢谢。我也花了一些时间来思考 M 的工作原理,您可以添加像 # M = {o:2, l:1} 和 # M = {h:-2, e: -1, l: -1, o: 0} 之类的 cmets插图..
  • 哦,如果字符串可以通过 unicode,M 可能更好作为 defaultdict,而不是数组..
  • @deathApril 好吧,这取决于。从技术上讲,字母表的大小必须是恒定的,Unicode 就是这种情况。当然,如果你想节省空间,你可以使用哈希表。
【解决方案2】:

我会从string1 中的string2 计算字符的位置,然后选择最低和最高字符位置之间距离最小的排列:

#          positions are:
#          01234567890123456
string1 = 'hello how are you'
string2 = 'olo'

# get string1 positions for each character from set(string2)
positions = {'o': [4, 7, 15],
             'l': [2, 3]}

# get all permutations of positions (don't repeat the same element)
# then pick the permutation with minimum distance between min and max position
# (obviously, this part can be optimized, this is just an illustration)
permutations = positions['o'] * positions['l'] * positions['o']
permutations = [[4,2,7], [4,3,7], [4,2,15], ...]
the_permutation = [4,3,7]

# voilà
output = string1_without_spaces[3:7]

【讨论】:

    【解决方案3】:

    这是一个使用 JavaScript 实现的示例。逻辑和上面@Aprillion 写的差不多。

    演示:http://jsfiddle.net/ZB6vm/4/

    var s1 = "hello how are you";
    var s2 = "olo";
    var left, right;
    var min_distance;
    var answer = "";
    
    // make permutation recursively
    function permutate(ar, arrs, k) {
        // check if the end of recursive call
        if (k == arrs.length) {
            var r = Math.max.apply(Math, ar);
            var l = Math.min.apply(Math, ar);
            var dist = r - l + 1;
            if (dist <= min_distance) {
                min_distance = dist;
                left = l;
                right = r;
            }
            return;
        }
        for (var i in arrs[k]) {
            var v = arrs[k][i];
            if ($.inArray(v, ar) < 0) {
                var ar2 = ar.slice();
                ar2.push(v);
                 // recursive call
                permutate(ar2, arrs, k + 1);
            }
        }
    }
    
    function solve() {
        var ar = [];   // 1-demension array to store character position
        var arrs = []; // 2-demension array to store character position
        for (var i = 0; i < s2.length; i++) {
            arrs[i] = [];
            var c = s2.charAt(i);
            for (var k = 0; k < s1.length; k++) { // loop by s1
                if (s1.charAt(k) == c) {
                    if ($.inArray(k, arrs[i]) < 0) {
                        arrs[i].push(k); // save position found
                    }
                }
            }
        }
        // call permutate
        permutate(ar, arrs, 0);
        answer = s1.substring(left, right + 1);
        alert(answer);
    }
    
    solve();
    

    希望这会有所帮助。

    【讨论】:

      【解决方案4】:

      有这个算法可以在 O(N) 中完成。

      想法:有 2 个数组,即。 isRequired[256] 和 isFound[256] 告诉了 S 中每个字符的频率,并在解析字符串 S 时,已经找到的每个字符的频率。另外,保留一个计数器,告诉何时找到有效窗口。一旦找到一个有效的窗口,我们可以将窗口(向右)移动,以保持问题的给定不变量。

      C++ 程序:

      void findMinWindow(const char *text, const char *pattern, int &start, int &end){
              //Calcuate lengths of text and pattern
              int textLen = strlen(text);
              int patternLen = strlen(pattern);
      
              // Declare 2 arrays which keep tab of required & found frequency of each char in pattern
              int isRequired[256] ; //Assuming the character set is in ASCII
              int isFound[256];
              int count = 0; //For ascertaining whether a valid window is found
      
              // Keep a tab of minimum window 
              int minimumWindow = INT_MAX;
      
              //Prepare the isRequired[] array by parsing the pattern
              for(int i=0;i<patternLen;i++){
                  isRequired[pattern[i]]++;
              }
      
              //Let's start parsing the text now
              // Have 2 pointers: i and j - both starting at 0
              int i=0;
              int j=0;
              //Keep moving j forward, keep i fixed till we get a valid window
              for(c=j;c<textLen;c++){
                 //Check if the character read appears in pattern or not
                 if(isRequired[text[c]] == 0){
                    //This character does not appear in the pattern; skip this
                    continue;
                 }
                 //We have this character in the pattern, lets increment isFound for this char
                 isFound[text[c]]++;
      
                 //increment the count if this character satisfies the invariant
                 if(isFound[text[c]] <= isRequired[text[c]]){
                    count++;
                 }
      
                 //Did we find a valid window yet?
                 if(count == patternLen){
                    //A valid window is found..lets see if we can do better from here on
                    //better means: increasing i to reduce window length while maintaining invariant
                    while(isRequired[s[i]] == 0 || isFound[s[i]] > isRequired[s[i]]){
                         //Either of the above 2 conditions means we should increment i; however we 
                         // must decrease isFound for this char as well.
                         //Hence do a check again
                         if(isFound[s[i]] > isRequired[s[i]]){
                            isFound[s[i]]--;
                         }
                         i++;
                    }
      
                     // Note that after the while loop, the invariant is still maintained
                     // Lets check if we did better
                     int winLength = j-i+1;
                     if(winLength < minimumWindow){
                        //update the references we got
                        begin = i;
                        end = j;
                        //Update new minimum window lenght
                        minimumWindow = winLength;
                     }
                } //End of if(count == patternLen)
           } //End of for loop 
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2023-04-05
        • 2020-07-23
        • 1970-01-01
        • 2011-01-28
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多