创建从列表中生成字典的函数答案

【问题标题】：Creating function that makes a dictionary from a list创建从列表中生成字典的函数
【发布时间】：2022-11-29 01:26:43
【问题描述】：

目标 -> 对于文本中除最后一个单词之外的每个单词，生成的字典中应出现一个键，对应的值应该是文本中紧跟在该关键词之后的每个单词的列表。重复的词应该有多个值：例子：

fun(["ONE", "two", "one", "three"]) == 
            {"one": ["two", "three"],"two": ["one] })

到目前为止我所拥有的：

def build_predictions(words: list) -> dict:
  dictionary = {}
  for word in words:
    if word.index() != words.len():
      if word not in dictionary:
        dictionary.update({word : words(words.index(word)+1)})
      else:
        dictionary[word] = dictionary[word] + [words(words.index(word)+1)]

我收到 EOF 错误 ;[ -> 不确定这是否正确。

【问题讨论】：

任务很明确，但是，如果您在给定文本样本的情况下提供预期输出，那就太好了。
确切的回溯是什么？此代码中似乎没有任何内容可以访问文件，因此如果问题出在脚本本身，则它会出现在某个地方前你的定义。（在某处寻找未闭合的引号；未闭合的括号等也是可能的，但通常会在解析器到达文件末尾之前触发更具体的错误。）
您的代码在 python 中有编译器错误。 words.len() 不是 python（你是说 len(words) 吗？）。您想要返回 dict，但您的方法不返回任何内容……最重要的是：EOF 表示 EndOfFile，因此，导致错误的部分在您的代码示例中丢失。
换句话说：请创建一个minimal reproducable example，以便我们提供帮助。输入数据示例、相关（！）代码部分、预期结果。见How to Ask。

标签： python list dictionary

【解决方案1】：

首先，您不应该使用索引，因为它只返回第一次出现的索引。这样应该效果更好

  for i in range(len(words)-1):
    word = world[i]

【讨论】：

【解决方案2】：

您的代码不会给您 EOF 错误，因为您没有在显示的代码中读取任何文件。由于您没有显示任何代码，因此我无法帮助您解决 EOF 错误。但是，您制作预测词典的方法存在很多问题：

word.index() 不是东西。如果你想要words中word的索引，使用for index, word in enumerate(words)进行迭代
words.len() 不是东西。可以通过len(words)得到words的长度
当前单词的索引永远不会等于列表的长度，因为列表索引从 0 开始并转到 len(lst) - 1。你的if条件应该是if index < len(words) - 1 1 .如果您只是将循环更改为for word in words[:-1]，则根本不需要进行此检查，这将跳过最后一个词。
if word not in dictionary，你想创建一个新列表包含下一个单词。
如果单词是在dictionary，你想附加那个词添加到列表中，而不是通过连接这两个词来创建一个新列表。

你需要返回dictionary 来自你的功能。

结合所有这些建议，您的代码将如下所示：

def build_predictions(words: list) -> dict:
    dictionary = {}
    for index, word in enumerate(words[:-1]):
        next_word = words[index + 1]
        if word not in dictionary:
            dictionary[word] = [next_word]
        else:
            dictionary[word].append(next_word)
    return dictionary

现在，如果你只想要独特的词，你可以创建一个包含套而不是列表。这样，当你.add()到集合时，如果集合中已经包含你要添加的单词，它不会有任何效果。

def build_predictions(words: list) -> dict:
    dictionary = {}
    for index, word in enumerate(words[:-1]):
        next_word = words[index + 1]
        if word not in dictionary:
            dictionary[word] = {next_word}     # Creates a set containing next_word
        else:
            dictionary[word].add(next_word)
    return dictionary

最后，如果您想将集合转换回列表，这很容易做到。而不是return dictionary，做：

    return {k: list(v) for k, v in dictionary.items()}

我们可以通过使用 collections.defaultdict 来消除检查是否 word in dictionary 的需要

我们可以压缩单词列表的两个部分：一个从开头到倒数第二个项目，另一个从第二个项目到最后一个项目。迭代两个切片的 zip 将为我们提供每次迭代中的当前单词和下一个单词。

然后，我们可以将这些收集到defaultdict(list) 或defaultdict(set) 中。

from collections import defaultdict

def build_predictions(words: list) -> dict:
    predictions = defaultdict(list)
    # or        = defaultdict(set)
    for word, next_word in zip(words[:-1], words[1:]):
        predictions[word].append(next_word)
        # or             .add(next_word)

    return predictions
    # or   {k: list(v) for k, v in predictions.items()}

【讨论】：

不要为他们做 OP 的作业。
另外，这是错误的，它只会保留单词后出现的最后一个单词，而不是所有此类单词的列表。
抱歉忘记添加关于重复单词的部分。有问题的编辑 -> 重复的单词不应创建新键，而应为重复的单词键添加新值。
谢谢，这看起来应该很有帮助！
@markgiz Glad this helped.重复的单词不会无论如何创建一个新密钥。如果你想忽略大小写，正如你在预期输入中所显示的那样，你可以在定义完 word = word.lower() 和 next_word = next_word.lower() 后立即执行