如何将字典中的值附加到列表中？答案

【问题标题】：How to append value from a dictionary to list?如何将字典中的值附加到列表中？
【发布时间】：2023-04-04 11:34:02
【问题描述】：

我有一本字典 d

d = {'word': [0,1,2,3,4,5,6], 'data':[2,3,4,5,6,7,8], 'mark': [1,4,5,2,5,6,7]}

和一个包含列表的数据框

df = (pd.DataFrame({data:[
              ['data', 'customer', 'mark', 'hello', 'spam', 'life'], 
              ['from','the', 'word', 'mark', 'data'], 
              ['hello', 'word', 'mark', 'data', 'the']]}, 
              index = [0,1,2]))

在上面的 df 代码中，单词被分成 6 列，每列一个单词，但在我的实际示例中，它们都在一列中，并且是一个大列表。

我想将数据框每个列表中的单词与字典中的键匹配。如果该单词出现，则将字典中的相应值附加到数据框中的单词，如果没有，则从列表中省略该单词。

输出应该是这样的：

new_df = [[[data,2,3,4,5,6,7,8], [mark,1,4,5,2,5,6,7]], 
          [[word,0,1,2,3,4,5,6], [mark,1,4,5,2,5,6,7], [data, 2,3,4,5,6,7,8]], 
          [[word,0,1,2,3,4,5,6], [mark,1,4,5,2,5,6,7], [data, 2,3,4,5,6,7,8]]]

因为在第一个列表中，原始字典中没有 customer、hello、spam、life 等词。同样，在下一个列表中，我们没有from、the 等词...

实现这一目标的最佳方法是什么？

我做了这样的事情：

def checkkey(dict, key):
if key in dict.keys():
    key.append(dict[key])
else:
    print("Not present")


checkkey(d,a)

其中 d 是这本字典和 a ='data'

如何对列表中的所有单词和整个数据框执行此操作？

【问题讨论】：

标签： python list dictionary append

【解决方案1】：

我已重新格式化您在问题中指定的数据框。我想这就是你要找的：

d = {'word': [0,1,2,3,4,5,6], 'data':[2,3,4,5,6,7,8], 'mark': [1,4,5,2,5,6,7]}
df = pd.DataFrame({"data":[['data', 'customer', 'mark', 'hello', 'spam', 'life'],['from','the', 'word', 'mark', 'data'],
                   ['hello', 'word', 'mark', 'data', 'the']]})

解决方案：

def check_word(x,d):
    return [[i,d[i]] for i in x if i in d]
            
df['data'] = df['data'].apply(lambda x:check_word(x,d))

print(df.data.values)

# ---- Output -----
# array([list([['data', [2, 3, 4, 5, 6, 7, 8]], ['mark', [1, 4, 5, 2, 5, 6, 7]]]),
#   list([['word', [0, 1, 2, 3, 4, 5, 6]], ['mark', [1, 4, 5, 2, 5, 6, 7]], ['data', [2, 3, 4, 5, 6, 7, 8]]]),
#   list([['word', [0, 1, 2, 3, 4, 5, 6]], ['mark', [1, 4, 5, 2, 5, 6, 7]], ['data', [2, 3, 4, 5, 6, 7, 8]]])],
#  dtype=object)

【讨论】：

这就是我想要的，首先检查这个词，如果它存在，以及字典中的值。下一步，如果我想为这些列表中的每一个列表获取列均值，您如何建议，我这样做？简单的 sum/len 给了我 1 个值，而我希望每行看到 7 个值。 IE。 [1.5 , 3.5, 4.5, 3.5, 5.5, 6.5, 7.5] 为第一行。

【解决方案2】：

像这样创建数据框：

df = (pd.DataFrame({'data': [
                 ['data', 'customer', 'mark', 'hello', 'spam', 'life'], 
                 ['from','the', 'word', 'mark', 'data'], 
                 ['hello', 'word', 'mark', 'data', 'the']]}, 
                 index = [0,1,2]))

>>> df
                                       data
0  [data, customer, mark, hello, spam, life]
1              [from, the, word, mark, data]
2             [hello, word, mark, data, the]

用途：

方法1：

df =  (df['data'].apply(lambda x: 
                [[name] + d[name] for name in x if name in d]))
>>> df
0    [[data, 2, 3, 4, 5, 6, 7, 8], [mark, 1, 4, 5, ...
1    [[word, 0, 1, 2, 3, 4, 5, 6], [mark, 1, 4, 5, ...
2    [[word, 0, 1, 2, 3, 4, 5, 6], [mark, 1, 4, 5, ...

方法2：如果您在每个列表中都有唯一值，那么您可以使用：

df = (df['data'].apply(lambda x: [[name] + d[name] 
                    for name in set(x).intersection(d)]))

这可能会快一点。

方法3： .apply 通常很慢，所以我在不使用 apply 的情况下发布了另一种方法。虽然看起来这里有更多的操作，但很可能这比.apply 更快。

首先让我们更改字典以在值列表中包含键。

d = {k: [k] + v  for k, v in d.items()}
>>> d
{'word': ['word', 0, 1, 2, 3, 4, 5, 6], 'data': ['data', 2, 3, ...

不是首先分解数据框以将列表的每个值放在新行中。

df1 = df.explode(column = 'data')

>>> df1
           data
    0      data
    0  customer
    0      mark
    0     hello
    0      spam
    0      life
    1      from
    1       the
    ...

现在进行映射，然后使用索引组合行。

 df1.data = df1.data.map(d)
 df1 = df1.dropna()
 df1 = df1.groupby(df1.index).agg(lambda x: x.tolist())

>>> df1
                                                data
0  [[data, 2, 3, 4, 5, 6, 7, 8], [mark, 1, 4, 5, ...
1  [[word, 0, 1, 2, 3, 4, 5, 6], [mark, 1, 4, 5, ...
2  [[word, 0, 1, 2, 3, 4, 5, 6], [mark, 1, 4, 5, ...

【讨论】：

感谢您的详细解释，第一种方法类似于接受的答案并且效果最好。由于爆炸功能，我没有尝试方法 3。我的原始数据有 1M 行和 300 列的值，我不知道每个列表中的单词数。