查找可以由列表中其他单词组成的最长单词的python代码的时间复杂度（后续）答案

【问题标题】：Time complexity of python code for finding the longest word that can be made of other words in the list (follow-up)查找可以由列表中其他单词组成的最长单词的python代码的时间复杂度（后续）
【发布时间】：2016-02-17 16:59:09
【问题描述】：

我知道已经提出并解决了类似的问题 (Time complexity of python code for finding the longest word that can be made of other words in the list)，但我有一个后续问题。

我有以下问题，

# Given a list of words, find the longest word made of other words in the list.
# Example: [cat, banana, dog, nana, walk, walker, dogwalker]
# Result: dogwalker

我在 Python 3 中有以下实现：

def find_longest_word(list_words):
    list_words = sorted(list_words, key=lambda x:len(x))[::-1]
    for elem in list_words:
        set_remaining = set(list_words) - set([elem])
        if find_if_composite(elem, set_remaining, {}):
            return elem
    return None

def find_if_composite(s, set_):
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite(s[i:], set_): # assuming that I can reuse the string
                return True
    return False

我们从上一个答案中知道，这段代码运行时间为 O(N!)，其中 N 是字符串的大小（如果我是正确的）。

我的问题是：有什么方法可以提高时间复杂度（例如，使用记忆化）？如果不是，为什么？如果是，如何？我尝试了以下代码，但似乎它在递归调用期间从未命中备忘录。

def find_if_composite_memo(s, set_, memo):
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in memo:
        return memo[s]
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite_memo(s[i:], set_, memo):
                memo[s] = True
                memo[s[i:]] = True
                return True
    memo[s] = False
    return False

要测试，请使用

b = ["cat", "banana", "dog", "nana", "walk", "walker", "dogwalker", "dogwalkerbanana"]
print(find_longest_word(b))

【问题讨论】：

标签： python algorithm recursion time-complexity memoization

【解决方案1】：

只有当你有一个完整的解决方案时，你才使用备忘录。由于您按长度顺序对列表进行排序，因此您的第一个完整解决方案就是所需的答案。因此，你永远不会使用你的备忘录。

我在您的记忆例程中添加了一些跟踪，并稍微增强了测试：

def find_if_composite_memo(s, set_, memo):
    print "ENTER", s, '\t', memo
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in memo:
        print "Found", s, memo[s]
        return memo[s]
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite_memo(s[i:], set_, memo):
                memo[s] = True
                memo[s[i:]] = True
                print "Add", memo
                return True
    memo[s] = False
    return False

b = ["cat", "banana", "dog", "nana",
     "walk", "walker",
     "dogcat", "dogwalker",
     "dogwalkerbanana",
     "catwalkerbanana-FAIL"]
print(find_longest_word(b))

输出：

ENTER catwalkerbanana-FAIL  {}
ENTER walkerbanana-FAIL     {}
ENTER erbanana-FAIL     {}
ENTER banana-FAIL   {'erbanana-FAIL': False}
ENTER -FAIL     {'erbanana-FAIL': False}
ENTER dogwalkerbanana   {}
ENTER walkerbanana  {}
ENTER erbanana  {}
ENTER banana    {'erbanana': False}
Add {'walkerbanana': True, 'erbanana': False, 'banana': True}
Add {'walkerbanana': True, 'erbanana': False, 'dogwalkerbanana': True, 'banana': True}
dogwalkerbanana

你看到问题了吗？使用“catwalkerbanana-FAIL”，您可以录制“catwalk”、“catwalker”和“walkerbanana”，但不要利用它。最后一个可以为您节省一些搜索时间。

至于改进整体算法，你可以考虑 SE 的code review 站点。

如果您使用以下一些方法，我希望您可以做得更好：

(1) When you fail to find the present top-level word **elem** as a composite, you can remove it from the word list entirely.  Don't keep it in set_remaining: since it's at least as long as any other word in the list, it cannot be a component.
(2) In **find_if_composite**, you go through a lot of pain to determine whether any beginning substring of **s** is in the word_list.  I think you can speed this up: use **s.startswith(word)** on each word in the list.  
(3) If the list has enough composite hits to matter, examine your memoization algorithm carefully.  Figure out what is and is not worth saving, and write the pseudo-code to handle only the time-saving parts.  Remember that, in working from largest to smallest, you're going to short-circuit a lot of cases.  In this test set, though, finding "walkerbanana" the first time seen would save a recursion later.

同时寻找使用其他string operations的好地方，例如index（子串搜索）。

【讨论】：