将基于条件的列添加到 df 的有效方法答案

【问题标题】：Efficient way to add a condition-based column to a df将基于条件的列添加到 df 的有效方法
【发布时间】：2021-10-04 10:23:26
【问题描述】：

我有一个包含两列的大 df：

Label                  Part_id
"Replace Cable"
"Ethernet Cable"       abc123
"Adjust Cable"
"Lubricate screw"

我希望添加一列“解决方案类型”，只要 part_id 不为 null 或 Label 包含“replace”/“[p]”字样，则该列将是“Part”，如果不是这种情况，则为 Action。

预期的输出将如下所示：

Label                  Part_id       Solution Type
"Replace Cable"                      Part
"Ethernet Cable"       abc123        Part
"Adjust Cable"                       Action
"Lubricate screw"                    Action

我想出了以下代码：

part_hints = r'(\[p\])|replace'

df['Solution Type'] = df.apply(lambda x: "Part" if not (pd.isnull(x.part_id)) or x.astype(str).str.contains(part_hints).any()
                                                else "Action", axis=1)

问题是它真的很慢...对于 0.5M 行的 df，这可能需要长达两分钟的运行时间。

不胜感激有关如何使这更快的想法。

谢谢！

【问题讨论】：

标签： python pandas performance numpy

【解决方案1】：

试试np.where()：

import numpy as np

df["Solution Type"]=np.where(
        (df['Label'].str.contains(part_hints,case=False,regex=True)) | (df['Part_id'].notna()),
        "Part",
        "Action")

【讨论】：

【解决方案2】：

你可以这样试试：

df.loc[df.Label.str.contains("replace", case=False) | df.Part_id.notnull(), 'Solution Type'] = 'Part'

df["Solution Type"].fillna("Action", inplace = True)

【讨论】：