元音的最长有序子序列 - 动态规划答案

【问题标题】：Longest Ordered Subsequence of Vowels - Dynamic Programming元音的最长有序子序列 - 动态规划
【发布时间】：2019-05-28 15:24:39
【问题描述】：

给定一个仅由元音组成的字符串，找到给定字符串中最长的子序列，使得它由所有五个元音组成，并且是一个或多个 a 的序列，后跟一个或多个 e，然后是一个或多个 i , 后跟一个或多个 o 和一个或多个 u。

如果有多个最长子序列，则打印任意一个。

问题：您能否在下面展示您将如何将 memoization 添加到 soln/展示如何使用 dp 解决？我已经看到了如何递归解决（下面）。我正在寻求帮助以到达 dp soln。

例子：

输入：str = "aeiaaioooaauuaeiou" 输出：{a，a，a，a，a，a，e，i，o，u} 在这种情况下，有两种可能的输出： {a, a, a, a, a, a, e, i, o, u} 和， {a, e, i, i, o, o, o, u, u, u} 每个长度 10

输入：str = "aaauuiieeou" 输出：没有子序列可能

方法：我们递归遍历字符串中的所有字符并遵循给定的条件：

如果子序列是空的，我们只在当前索引处包含元音，如果它是'a'。否则，我们继续下一个索引。如果当前索引处的元音与子序列中包含的最后一个元音相同，我们将其包含在内。如果当前索引处的元音是包含在子序列中的最后一个元音之后的下一个可能的元音（即 a–> e–> i–> o–> u ），我们有两个选择：要么包含它，要么继续下一个索引。因此，我们选择给出最长子序列的那个。如果上述条件都不满足，我们继续下一个索引（以避免子序列中元音的无效排序）。如果我们已经到达字符串的末尾，我们检查当前子序列是否有效。如果它是有效的（即如果它包含所有元音），我们返回它，否则我们返回一个空列表。

# Python3 program to find the longest subsequence 
# of vowels in the specified order 

vowels = ['a', 'e', 'i', 'o', 'u'] 

# Mapping values for vowels 
mapping = {'a': 0, 'e': 1, 'i': 2, 'o': 3, 'u': 4} 

# Function to check if given subsequence 
# contains all the vowels or not 
def isValidSequence(subList): 

    for vowel in vowels: 
        if vowel not in subList: 
            return False

    return True

# Function to find the longest subsequence of vowels 
# in the given string in specified order 
def longestSubsequence(string, subList, index): 

    # If we have reached the end of the string, 
    # return the subsequence 
    # if it is valid, else return an empty list 
    if index == len(string): 
        if isValidSequence(subList) == True: 
            return subList 
        else: 
            return [] 

    else: 
        # If there is no vowel in the subsequence yet, 
        # add vowel at current index if it is 'a', 
        # else move on to the next character 
        # in the string 
        if len(subList) == 0: 

            if string[index] != 'a': 
                return longestSubsequence(string, subList, index + 1) 
            else: 
                return longestSubsequence(string, subList + \ 
                            [string[index]], index + 1) 

        # If the last vowel in the subsequence until 
        # now is same as the vowel at current index, 
        # add it to the subsequence 
        elif mapping[subList[-1]] == mapping[string[index]]: 
            return longestSubsequence(string, subList + \ 
                            [string[index]], index + 1) 

        # If the vowel at the current index comes 
        # right after the last vowel 
        # in the subsequence, we have two options: 
        # either to add the vowel in 
        # the subsequence, or move on to next character. 
        # We choose the one which gives the longest subsequence. 
        elif (mapping[subList[-1]] + 1) == mapping[string[index]]: 

            sub1 = longestSubsequence(string, subList + \ 
                                [string[index]], index + 1) 
            sub2 = longestSubsequence(string, subList, index + 1) 

            if len(sub1) > len(sub2): 
                return sub1 
            else: 
                return sub2 

        else: 
            return longestSubsequence(string, subList, index + 1) 

# Driver Code 
if __name__ == "__main__": 

    string = "aeiaaioooauuaeiou"

    subsequence = longestSubsequence(string, [], 0) 
    if len(subsequence) == 0: 
        print("No subsequence possible") 
    else: 
        print(subsequence)

输出： ['a', 'e', 'i', 'i', 'o', 'o', 'o', 'u', 'u', 'u']

【问题讨论】：

stackoverflow.com/a/47034920/6024572

标签： recursion dynamic-programming memoization subsequence

【解决方案1】：

memoize 函数的关键实现是你可以使用(last_chosen_char, length, index) 作为你的memo key。换句话说，将"aaeeeiiioo", i=15 和"aaaaaaaeio", i=15 视为相同，因为它们最后选择的字符、长度和当前索引是相同的。两个调用的子问题将具有相同的解决方案，我们只需要计算其中一个即可。

补充几点：

避免使用破坏函数封装的全局变量，这些变量应该作为黑盒工作并且没有外部依赖项。
使用默认参数或辅助函数向调用者隐藏不必要的参数并提供干净的界面。
由于列表不可散列（因为它们是可变的），我改用字符串。
记忆化后，您的调用堆栈是新的瓶颈。您可以考虑使用循环来收集一系列重复项。同样，一旦您选择了"u"，您不妨循环并收集字符串中所有剩余的"u"s；没有更多的决定要做。您可能希望对输入字符串进行一些预处理，以剪除更多的调用堆栈。例如，记录每个索引的下一个字符位置，并在您击中最后一个"u" 后尽早退出。但是，这些都无助于最坏的情况，因此使用自下而上的方法迭代地重写逻辑将是最佳的。

综合起来，您现在可以输入不超过堆栈大小的字符串：

def longest_subsequence(string):
    def helper(chosen="", i=0):
        if i == len(string):
            return chosen if set("aeiou").issubset(set(chosen)) else ""

        hashable = (chosen[-1] if chosen else None, len(chosen), i)

        if hashable in memo:
            return memo[hashable]

        if not chosen:
            res = helper("a" if string[i] == "a" else chosen, i + 1)
        elif chosen[-1] == string[i]:
            res = helper(chosen + string[i], i + 1)
        elif mapping[chosen[-1]] + 1 == mapping[string[i]]:
            sub1 = helper(chosen + string[i], i + 1)
            sub2 = helper(chosen, i + 1)

            res = sub1 if len(sub1) > len(sub2) else sub2
        else:
            res = helper(chosen, i + 1)

        memo[hashable] = res
        return res

    mapping = {x: i for i, x in enumerate("aeiou")}
    memo = {}
    return helper()

下面是一个在 900 个字符的字符串上运行的示例：

original: uouoouiuoueaeeiiiaaaouuuueuaiaeaioaaiouaouiaiiaiuuueaueaieeueeuuouioaoaeueoioeoeioiuiaiaoeuuuuauuaiuueiieaauuoieiuoiaiueeeoaeaueaaaiaiiieuaoaiaaoiaoaueouaiiooaeeoioiaoieouuuoeaoaeeaaiuieouaeeooiiuooeauueaoaoaeuoaieauooueeeuiueuaeoeouuuiaoiauiaoiaaeeoeouuuueuiiuueoeeoiieuuuauooeuuaaaueuaaaaoaieaiiuoaoouueeeooiuoieoaueooaaioaeoiiiauuoeiaioeauaueiiaeoueioeiieuoiueoeoueeiuiooaioeooueuioaoaeoaiiiauoooieueoeauaiauauuauoueeauouieeoeoeiaeeeeooooeoaueouuuuiioeeuioueeuiaiueooeueeuuuoooeeuooeuoeeeaiioeeiioauiaeaiuaiauooiioeoeueoeieuueouaeeuuoeuaueeeauiiaoeeaeuieoeiuoooeaeeiuaiauuieouuuiuouiuieieoueiiaoiuioaiououooieiauuuououuiiiuaoeeieueeiuoeiaouoeueieuoiaeuoeiieeeaaaeiaeeoauoaoeuuoiiaaeiuiouueaoeuueeoouiaeeeouiouaaaeiouaaeauauioeoeuiauaeaououoaiuuueuieiaeeaouuueeaaiauoieoioaoiuuaioaiauioueieuuuueiaeeuaoeeoeioeoaiauiiuaouuoouooouaeueaioiaouuiiuauiaaeooeueiuoiuoeeauueuuueuueouiiauiuaoiuuoeuoeeauaeoo    
max subsequence: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeiiiiiiiiiiiooooouuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu

Try it!

【讨论】：

"(last_chosen_char, length, index) 作为您的备忘录键。"即使它们的长度不同，如果只是 last_chosen_char 和索引相同，则两者的后续调用将具有相同的解决方案吗？给定“aeeeiiioo”，i=*14* 和“aaaaaaaeio”，i=15，我们可以只使用 (last_chosen_char, index) 记住“o”之后的后续调用，附加到两个前缀，看看哪个更长吗？或者我们甚至可以只计算当前长度较长的前缀而忽略较短长度的前缀 ya？
谢谢你 ggorlen！ “在记忆化之后，你的调用堆栈是新的瓶颈。”对，例如，如果我看到相同的字符串@更大的索引，我知道来自较小索引的结果将大于或等于来自更大索引的结果，是吗？例如。如果我调用 helper("aee", 7) 和 helper("aee", 9)，则 helper("aee", 7) 的结果 >= helper("aee", 9) 的结果。
想要使备忘录键更宽以便获得更多缓存命中是很诱人的——我在写这个答案的过程中自己尝试过。但是，如果你在两个实际上不同的状态之间创建了一个错误的等价，你就会损害正确性，而省略长度则会造成这种影响。试试看。至于你的第二句话，我不太理解，但我所说的递归作为瓶颈的意思是每个 char 基本上都必须有自己的函数调用。如果堆栈大小为 1000，则可以处理的最大字符串为 999 左右（包括main）。所以下一步就是迭代地写这个。
上限，是 O(2^n)，因为在最坏的情况下，您会在每个字符上点击双分支并创建 2 个递归调用。对于Java，我会将last_chosen_char 的ASCII 值乘以某个素数，并对length 和index 进行类似操作，然后将它们相加。在这样的短弦上没有被击中也就不足为奇了，而且无论如何也不会节省很多工作。可能有一种方法可以提高密钥效率以产生更多的命中，但重要的是正确性。我没有考虑太多关于这个自下而上的迭代 DP，但如果我有时间，我会尝试编写它。
Also useful for hashing in Java

【解决方案2】：

    private static int longestSubSeqOfVowels(String input) {

    char[] v = { 'a', 'e', 'i', 'o', 'u' };
    HashMap<Character, Integer> charCount = new HashMap<Character, Integer>();
    char c;
    int vCount = -1;

    for (int i = 0; i < input.length(); i++) {
        c = input.charAt(i);
        if (vCount == -1 && c != 'a') {
            continue;

        }
        int value = charCount.get(c) == null ? 0 : charCount.get(c) + 1;
        if (value == 0) {

            if (c == v[vCount + 1]) {
                value = vCount >= 0 ? charCount.get(v[vCount]) + 1 : 1;
                vCount++;
            }

            charCount.put(c, value);

        } else {
            charCount.put(c, value);
        }
    }
    return charCount.get('u').intValue();
}

上面是获取元音最长子序列的长度。。同样可以修改为获取字符串，因为我们在映射中保持每个字符的计数。

【讨论】：