遍历列以匹配 dict 值答案

【问题标题】：iterate through column to match with dict values [closed]遍历列以匹配 dict 值
【发布时间】：2021-04-19 19:26:30
【问题描述】：

我有一个类似这样的字典：

dict = {'color':['red', 'blue', 'green'], 'fruits':['apple', 'banana', 'grape'], 'animal':['cat', 'dog']}

和 df 有两列；具有多个字符串的文本列：

index   |   text
-------------------------------
a       | house, chair, green
-------------------------------
b       | yellow, banana, wall
--------------------------------
c       | dog, brown, grass
--------------------------------

如果文本列中的任何字符串与 dict.values 匹配，我想使用来自 dict 的键对向 df 添加额外的列，因此对于 a - color / b - fruits / c - animal。

我尝试使用isin 获取列表，但认为使用 dict 可能会更有效。？任何帮助表示赞赏

【问题讨论】：

欢迎来到 Stack Overflow！请拨打tour，阅读what's on-topic here、How to Ask和question checklist，并提供minimal reproducible example。 “为我实现此功能”与此站点无关。你必须诚实地尝试，然后就你的算法或技术提出一个具体问题。

标签： python string dictionary isin

【解决方案1】：

最简单的方法是使用apply()。

def get_type(input_strs):
    for key, val in type_dict:
        for input_str in input_strs:
            if input_str in val:
                return key

df["str_type"] = df["text"].apply(get_type)

但是，请记住，apply() 的优化很差 - 它大致相当于使用 for 循环来应用函数。

如果性能是一个问题，您可以考虑像 {"red":"color", "blue":"color" ...} 那样反转您的字典并编写一个更简单的函数来应用类似

def get_type(input_strs):
    for input_str in input_strs:
        if input_str in type_dict:
            return type_dict[input_str]

您还可以考虑在 pandas 中为一系列 strs 使用优化函数之一，例如 extract() 假设 df["text"] 是一系列 strs，而不是 strs 列表。没有针对一系列列表的优化 pandas 函数，如果性能优先，将列表保留在 DataFrame 中通常是个坏主意。

【讨论】：