单词列表中最长的单词链答案

【问题标题】：Longest word chain from a list of words单词列表中最长的单词链
【发布时间】：2019-04-28 07:56:13
【问题描述】：

所以，这是我正在尝试制作的功能的一部分。

我不希望代码太复杂。

我有一个单词列表，例如

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

单词链序列的思想是下一个单词以最后一个单词结尾的字母开头。

（编辑：每个单词不能多次使用。除此之外没有其他限制。）

我希望输出给出最长的词链序列，在这种情况下是：

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

我不太确定该怎么做，我尝试过不同的尝试。其中之一……

如果我们从列表中的特定单词开始，此代码会正确找到单词链，例如词[0]（所以“长颈鹿”）：

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

word_chain = []

word_chain.append(words[0])

for word in words:
    for char in word[0]:

       if char == word_chain[-1][-1]:
            word_chain.append(word)

print(word_chain)

输出：

['giraffe', 'elephant', 'tiger', 'racoon']

但是，我想找到可能最长的单词链（如上所述）。

我的方法：所以，我尝试使用我编写并循环的上述工作代码，使用列表中的每个单词作为起点并找到每个单词的单词链[0 ]、word[1]、word[2] 等。然后我尝试使用 if 语句找到最长的词链，并将长度与之前最长的链进行比较，但我无法正确完成，我没有真的不知道这是怎么回事。

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

word_chain = []
max_length = 0
for starting_word_index in range(len(words) - 1):

    word_chain.append(words[starting_word_index])

    for word in words:
        for char in word[0]:

            if char == word_chain[-1][-1]:
                word_chain.append(word)

    # Not sure

    if len(word_chain) > max_length:
        final_word_chain = word_chain
        longest = len(word_chain)
        word_chain.clear()

print(final_word_chain)

这是我的第 n 次尝试，我认为这会打印一个空列表，在此之前我进行了不同的尝试，但未能正确清除 word_chain 列表并最终再次重复单词。

非常感谢任何帮助。希望我没有让这变得太乏味或令人困惑......谢谢！

【问题讨论】：

@dataLeo 嗨，每个单词不能多次使用（所以大象只能使用一次），除此之外没有其他限制。目的是找到最长的词链序列。
您是否需要额外的预防措施，以防有人输入以相同字母开头和结尾的名称？（……好像我不能想出一个……）
这将是对codegolf.stackexchange.com的巨大贡献
这个问题等价于在有向循环图中找到最长的路径，其中边最多遍历一次。
这个问题的通用名称：stackoverflow.com/questions/29522351/…

标签： python recursion graph path-finding

【解决方案1】：

此函数创建一种称为生成器的迭代器（请参阅：What does the "yield" keyword do?）。它递归地创建同一生成器的更多实例以探索所有可能的尾序列：

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

def chains(words, previous_word=None):
    # Consider an empty sequence to be valid (as a "tail" or on its own):
    yield []
    # Remove the previous word, if any, from consideration, both here and in any subcalls:
    words = [word for word in words if word != previous_word]
    # Take each remaining word...
    for each_word in words:
        # ...provided it obeys the chaining rule
        if not previous_word or each_word.startswith(previous_word[-1]):
            # and recurse to consider all possible tail sequences that can follow this particular word:
            for tail in chains(words, previous_word=each_word):
                # Concatenate the word we're considering with each possible tail:
                yield [each_word] + tail  

all_legal_sequences = list(chains(words))  # convert the output (an iterator) to a list
all_legal_sequences.sort(key=len) # sort the list of chains in increasing order of chain length
for seq in all_legal_sequences: print(seq)
# The last line (and hence longest chain) prints as follows:
# ['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

或者，更有效地直接到达最长的链：

print(max(chains(words), key=len)

最后，这是一个允许输入中重复单词的替代版本（即，如果您包含 N 次单词，则可以在链中最多使用 N 次）：

def chains(words, previous_word_index=None):
    yield []
    if previous_word_index is not None:
        previous_letter = words[previous_word_index][-1]
        words = words[:previous_word_index] + words[previous_word_index + 1:]
    for i, each_word in enumerate( words ):
        if previous_word_index is None or each_word.startswith(previous_letter):
            for tail in chains(words, previous_word_index=i):
                yield [each_word] + tail

【讨论】：

【解决方案2】：

我有个新想法，如图：

我们可以通过word[0] == word[-1]构造一个有向图，然后将问题转化为找到最大长度路径。

【讨论】：

【解决方案3】：

正如其他人所说，问题是找到longest path in a directed acyclic graph。

对于任何与 Python 相关的图形，networkx 是您的朋友。

您只需要初始化图形，添加节点，添加边并启动dag_longest_path：

import networkx as nx
import matplotlib.pyplot as plt

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat',
         'hedgehog', 'mouse']

G = nx.DiGraph()
G.add_nodes_from(words)

for word1 in words:
    for word2 in words:
        if word1 != word2 and word1[-1] == word2[0]:
            G.add_edge(word1, word2)
nx.draw_networkx(G)
plt.show()
print(nx.algorithms.dag.dag_longest_path(G))

它输出：

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

注意：此算法仅在图中没有循环（循环）时才有效。这意味着它会因['ab', 'ba'] 而失败，因为会有一条无限长的路径：['ab', 'ba', 'ab', 'ba', 'ab', 'ba', ...]

【讨论】：

非循环是一个很大的假设。循环问题是 NP-Hard。
@slider：确实。我会调查一下是否可以解决 networkx 的一般问题。

【解决方案4】：

当包含正确初始字符的每个可能的字母都添加到运行列表中时，您可以使用递归来探索出现的每个“分支”：

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
def get_results(_start, _current, _seen):
  if all(c in _seen for c in words if c[0] == _start[-1]):
    yield _current
  else:
      for i in words:
        if i[0] == _start[-1]:
          yield from get_results(i, _current+[i], _seen+[i])


new_d = [list(get_results(i, [i], []))[0] for i in words]
final_d = max([i for i in new_d if len(i) == len(set(i))], key=len)

输出：

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

此解决方案的工作原理类似于广度优先搜索，因为函数 get_resuls 将继续遍历整个列表，只要当前值之前没有被调用过。函数看到的值被添加到_seen 列表中，最终停止递归调用流。

此解决方案还将忽略重复的结果：

words = ['giraffe', 'elephant', 'ant', 'ning', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse',]
new_d = [list(get_results(i, [i], []))[0] for i in words]
final_d = max([i for i in new_d if len(i) == len(set(i))], key=len)

输出：

['ant', 'tiger', 'racoon', 'ning', 'giraffe', 'elephant']

【讨论】：

【解决方案5】：

我有一个基于树的方法来解决这个问题，这可能会更快。我仍在努力实现代码，但这是我要做的：

    1. Form a tree with the root node as first word. 
    2. Form the branches if there is any word or words that starts 
with the alphabet with which this current word ends.
    3. Exhaust the entire given list based on the ending alphabet
 of current word and form the entire tree.
    4. Now just find the longest path of this tree and store it.
    5. Repeat steps 1 to 4 for each of the words given in the list
 and print the longest path among the longest paths we got above.

如果给出的单词列表很大，我希望这可能会提供更好的解决方案。我会用实际的代码实现来更新它。

【讨论】：

【解决方案6】：

本着蛮力解决方案的精神，您可以检查words列表的所有排列并选择最佳连续启动序列：

from itertools import permutations

def continuous_starting_sequence(words):
    chain = [words[0]]
    for i in range(1, len(words)):
        if not words[i].startswith(words[i - 1][-1]):
            break
        chain.append(words[i])
    return chain

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
best = max((continuous_starting_sequence(seq) for seq in permutations(words)), key=len)

print(best)
# ['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

由于我们正在考虑所有排列，我们知道必须有一个从最大的词链开始的排列。

当然，这具有 O(n n!) 时间复杂度：D

【讨论】：

【解决方案7】：

希望有一种更直观的方式，无需递归。遍历列表，让 Python 的排序和列表推导为您完成工作：

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

def chain_longest(pivot, words):
    new_words = []
    new_words.append(pivot)
    for word in words:
        potential_words = [i for i in words if i.startswith(pivot[-1]) and i not in new_words]
        if potential_words:
            next_word = sorted(potential_words, key = lambda x: len)[0]
            new_words.append(next_word)
            pivot = next_word
        else:
            pass
    return new_words

max([chain_longest(i, words) for i in words], key = len)
>>
['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

设置一个基准并检查 potential_words 是否以您的基准词开头并且没有出现在您的新单词列表中。如果找到，则只需按长度对它们进行排序并获取第一个元素。

列表推导将每个单词作为枢轴，并返回最长的链。

【讨论】：

预期的输出和最长的词链是：['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']，你的给['giraffe', 'elephant', 'tiger', 'racoon']还是我漏掉了什么？
您可以将key = lambda x: len(x) 替换为key=len。
@mandingo 抱歉，我对输出感到困惑。让我改变它。

【解决方案8】：

这是一种可行的递归蛮力方法：

def brute_force(pool, last=None, so_far=None):
    so_far = so_far or []
    if not pool:
        return so_far
    candidates = []
    for w in pool:
        if not last or w.startswith(last):
            c_so_far, c_pool = list(so_far) + [w], set(pool) - set([w])
            candidates.append(brute_force(c_pool, w[-1], c_so_far))
    return max(candidates, key=len, default=so_far)

>>> brute_force(words)
['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

在每次递归调用时，它都会尝试使用剩余池中的每个符合条件的单词继续链。然后它选择最长的这种延续。

【讨论】：

【解决方案9】：

使用递归方法的另一个答案：

def word_list(w_list, remaining_list):
    max_result_len=0
    res = w_list
    for word_index in range(len(remaining_list)):
        # if the last letter of the word list is equal to the first letter of the word
        if w_list[-1][-1] == remaining_list[word_index][0]:
            # make copies of the lists to not alter it in the caller function
            w_list_copy = w_list.copy()
            remaining_list_copy = remaining_list.copy()
            # removes the used word from the remaining list
            remaining_list_copy.pop(word_index)
            # append the matching word to the new word list
            w_list_copy.append(remaining_list[word_index])
            res_aux = word_list(w_list_copy, remaining_list_copy)
            # Keep only the longest list
            res = res_aux if len(res_aux) > max_result_len else res 
    return res

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
word_list(['dog'], words)

输出：

['dog', 'giraffe', 'elephant', 'tiger', 'racoon']

【讨论】：