【问题标题】:Smallest substring that can be replaced to make the string have the same number of each character可以替换的最小子字符串以使字符串中每个字符的个数相同
【发布时间】:2016-12-17 02:12:50
【问题描述】:

我正在尝试解决一个几乎完全一样的问题。特别是给了我一个字符串s,这样s.Length % 4 == 0 和每个s[i]'A''C''T''G' 之一。我想找到可以替换的最小子字符串,以便'A''C''T''G' 中的每一个都恰好出现s.Length / 4 次。

例如,对于s="GAAATAAA",一种最佳解决方案是将子字符串"AAATA" 替换为"TTCCG",从而得到"GTTCCGAA"

我在下面的 cmets 中描述了我的方法,我想知道它是否普遍正确,因为它会让我得到正确的答案。

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
class Solution
{
    static string ReplacementForSteadiness(string s)
    {   
        var counter = new Dictionary<char,int>() {
            { 'A', 0 }, { 'C', 0 }, { 'G', 0 }, { 'T', 0 }
        };
        for(int i = 0; i < s.Length; ++i)
                counter[s[i]] += 1;

        int div = s.Length / 4;

        var pairs = counter.ToList();
        if(pairs.All(p => p.Value == div))
            return "";

        // If here, that means there is an even count of characters in s. For example, if
        // s = "AAATGTTCTTGCGGGG", then counter = { A -> 3, T -> 5, C -> 2, G -> 6 },
        // div = 4, and we know that we need to increase the number of As by 1, decrease 
        // the number of Ts by 1, increase the number of Cs by 2 and decrease the number
        // of Gs by 2.

        // The smallest strings to replace will have 1 T and 2 Gs, to be replaced with 1 A and
        // 2 Cs (The order of characters in the replacement string doesn't matter).
        // "TGG" --> "ACC" 
        // "GTG" --> "ACC"
        // "GGT" --> "ACC"

        // None of those strings exist in s. The next smallest strings that could be replaced
        // would have 1 T and 3Gs, to be replaced with 1 A and 2 of the Gs to be replaced with
        // Cs. Or, 2 Ts and 2Gs, 1 of the Ts to be replaced by an A and both the Gs to be replaced
        // by Cs.
        // "TGGG" --> "AGCC"
        // "GTGG" --> "AGCC"
        // "GGTG" --> "AGCC"
        // "GGGT" --> "AGCC"
        // "TTGG" --> "ATCC"
        // "TGTG" --> "ATCC"
        // "GTGT" --> "ATCC"
        // "GGTT" --> "ATCC"

        // None of those strings exist in s. Etc.      

        string r;

        // ... 

        return r;
    }

    static void Main(String[] args)
    {
       Console.ReadLine(); // n
       string str = Console.ReadLine();
       string replacement = ReplacementForSteadiness(str);
       Console.WriteLine(replacement.Length);
    }
}

【问题讨论】:

标签: c# string algorithm time-complexity dynamic-programming


【解决方案1】:

如果字符串已经有一组平衡的字符,那么你就完成了,不需要做任何事情。

否则,您始终可以通过替换最少的零个字符来解决问题。您可以通过添加任何缺少的字符来做到这一点。所以例如拿你的测试用例:

GAAATAAA

出现次数最多的字符是 A 和 6。您需要 5 个额外的 G、5 个额外的 T 和 6 个额外的 C。因此,将一个 A 替换为所需的字符,包括 A 本身:

GAAATAA[AGGGGGTTTTTCCCCCC]

由于原来的 A 被替换为 A,实际上您已经替换了零个字符,这是可能的最小值。

【讨论】:

  • 虽然 OP 没有明确说明,但我相信(基于示例,以及正如您所展示的那样,问题可以很容易地解决的事实)替换字符串必须是与被替换的字符串长度相同。
【解决方案2】:

我认为您的解决方案可行,但其复杂性太高。
这是另一种解决方案
如果计算字符串中的字符返回 { 'A', 4 }, { 'C', 6 }, { 'G', 6 }, { 'T', 4 } 必须以 C 或 G 开头的子字符串,结束使用 C 或 G 且长度 >= 2
所以我们需要做的是获取每个验证这些条件的字符串,测试它是否包含“坏字符”,在我们的例子中是一个 C 和一个 G。如果它的长度 = 2,我们就赢了,否则我们保存在一个临时变量中并继续我们的测试

   using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
class Solution
{
    static void Main(String[] args)
    {
        string[] inputs = { "GAAATAAA", "CACCGCTACCGC", "CAGCTAGC", "AAAAAAAA", "GAAAAAAA", "GATGAATAACCA", "ACGT" };

        List<string> replacement = new List<string>();
        foreach (var item in inputs)
        {
            replacement.Add(StringThatHasToBeReplaced(item));
        }
    }

    static string StringThatHasToBeReplaced(string s)
    {
        var counter = new Dictionary<char, int>() {
            { 'A', 0 }, { 'C', 0 }, { 'G', 0 }, { 'T', 0 }
        };
        for (int i = 0; i < s.Length; ++i)
            counter[s[i]] += 1;

        int div = s.Length / 4;
        var pairs = counter.ToList();

        if (pairs.Where(p => p.Value != div).Count() == 0)
        {
            return null;
        }

        List<char> surplusCharacter = pairs.Where(p => p.Value > div).Select(p => p.Key).ToList();
        int minLength = pairs.Where(p => p.Value > div).Sum(p => p.Value - div);
        string result = s;
        for (int i = 0; i < s.Length - minLength + 1; i++) // i is the start index
        {
            if (surplusCharacter.Contains(s[i]))
            {
                if (minLength == 1)
                    return s[i].ToString();

                for (int j = i + minLength - 1; j < s.Length; j++) // j is the end index
                {
                    if (surplusCharacter.Contains(s[j]))
                    {
                        var substring = s.Substring(i, j - i);
                        if (substring.Length >= result.Length)
                        {
                            break;
                        }
                        // we test if substring can be the string that need to be replaced
                        var isValid = true;
                        foreach (var c in surplusCharacter)
                        {
                            if (substring.Count(f => f == c) < counter[c] - div)
                            {
                                isValid = false;
                                break;
                            }
                        }
                        if (isValid)
                            result = substring;
                    }
                }
            }
        }
        return result;
    }


}

我做了一些修改来处理临界情况。 这是一些测试样本,我得到的结果看起来不错

【讨论】:

  • (如果你很好奇,那个解决方案没有通过测试)
  • @user6048670 你能举个错误的例子吗?也许解决方案可以改进
  • 尝试通过hackerrank.com/challenges/bear-and-steady-gene 运行它,这是我最终要解决的问题
  • @AnotherGeek 我认为您的第二种情况可以使用长度为 6 而不是 7 的子字符串来完成。您可以将 CACCGC 替换为 AGAGTT(最小长度:6)以提供 AGAGTTTACCGC。
【解决方案3】:

想法?对凌乱的代码 + python 解决方案感到抱歉。我一开始是在手机上写这篇文章的,感觉很懒。

import re
from itertools import permutations

def find_min(s):
    freq = {ch:0 for ch in 'ATGC'}
    for ch in s:
        freq[ch] += 1
    desired_len = int(len(s)/4)
    fixes = {ch:desired_len-freq[ch] for ch in 'ATGC'}
    replacement = ''
    for ch in fixes:
        adj = fixes[ch]
        if adj < 0:
            replacement += ch*(-1*adj)
    perms = set(permutations(replacement))
    m = len(s)
    to_replace = ''
    for rep in perms:
        regex = '.*?'.join([ch for ch in rep])
        finds = re.findall(regex,s)
        if finds:
            x = sorted(finds, key=lambda x:len(x))[0]
            if m >= len(x):
                m = len(x)
                to_replace = x

    print_replacement(s, to_replace, fixes)

def print_replacement(inp, to_replace, fixes):
    replacement = ''
    for ch in to_replace:
        if fixes[ch] > 0:
            replacement += ch
    for ch in fixes:
        if fixes[ch] > 0:
            replacement += ch*fixes[ch]
    print('{0}\t\t- Replace {1} with {2} (min length: {3})'.format(inp ,to_replace, replacement, len(replacement)))


def main():
    inputs =  ["GAAATAAA", "CACCGCTACCGC", "CAGCTAGC", "AAAAAAAA", "GAAAAAAA", "GATGAATAACCA", "ACGT"]
    for inp in inputs:
        find_min(inp)

if __name__ == '__main__':
    main()

感谢@AnotherGeek 的测试输入!这是输出。

GAAATAAA        - Replace AAATA with TCCGT (min length: 5)
CACCGCTACCGC    - Replace CACCGC with AGAGTT (min length: 6)
CAGCTAGC        - Replace C with T (min length: 1)
AAAAAAAA        - Replace AAAAAA with CCGGTT (min length: 6)
GAAAAAAA        - Replace AAAAA with CCGTT (min length: 5)
GATGAATAACCA    - Replace ATGAA with TGCGT (min length: 5)
ACGT            - Replace  with  (min length: 0)

我意识到这该死的低效。有什么改进建议吗?

【讨论】:

    【解决方案4】:
    public int balancedString(String s) {
            int[] count = new int[128];
            int n = s.length(), res = n, i = 0, k = n / 4;
            for (int j = 0; j < n; ++j) {
                ++count[s.charAt(j)];
            }
            for (int j = 0; j < n; ++j) {
                --count[s.charAt(j)];
                while (i < n && count['A'] <= k && count['C'] <= k && count['T'] <= k && count['G'] <= k) {
                    res = Math.min(res, j - i + 1);
                    ++count[s.charAt(i++)];
                }
            }
            return res;
        }
    

    【讨论】:

    • 对你的代码所做的添加评论为这个问题增加了很多重要的价值~
    猜你喜欢
    • 2020-08-31
    • 2017-12-06
    • 1970-01-01
    • 2016-10-01
    • 2022-01-19
    • 2011-04-13
    • 1970-01-01
    • 2016-09-10
    相关资源
    最近更新 更多