根据另一列中的条件将 lambda 应用于熊猫数据框答案

【问题标题】：Applying lambda to pandas dataframe based on conditiional in other column根据另一列中的条件将 lambda 应用于熊猫数据框
【发布时间】：2017-10-25 11:28:30
【问题描述】：

我有一个这样的数据框

product        complaint
Student Loan   words words words
Mortgage       words words words
Credit Card    words words words
Student Loan   words words words

我正在尝试对每个投诉单元格中的词语进行预处理，但我想根据产品进行预处理。这行代码将我的预处理功能应用于“投诉”列中的每个单元格，它工作正常

df['complaint'] =df['complaint'].apply(lambda x: pre_process(x))

我的预处理功能基本上是对文本进行标记，删除停用词并对投诉进行词形还原。

我想更进一步，删除基于产品的自定义停用词列表。所以抵押贷款、学生贷款和信用卡都有不同的停用词列表，我只想将这些停用词应用于那些相关的投诉。如果可行的话，可以遵循以下原则：

df['complaint'] =df['complaint'].apply(lambda x: pre_process(x,Student_stopwords) if df['product'] == "Student Loan')
df['complaint'] =df['complaint'].apply(lambda x: pre_process(x,mortgage_stopwords) if df['product'] == "Mortgage")
df['complaint'] =df['complaint'].apply(lambda x: pre_process(x,creditcard_stopwords) if df['product'] == "Credit Card")

我知道这可能效率极低，但这就是我想象的样子，除非我不知道如何将我的预处理功能仅应用于特定的单元格。

任何帮助将不胜感激。

【问题讨论】：

标签： python pandas dataframe lambda

【解决方案1】：

您可以定义一个单独的函数，然后使用 apply。像这样的：

def which_preproc(row):
    return student if row['product'] == 'student' 
    # similarly for others and other preprocessing you want

然后使用apply：

df['complaint'] = df.apply(which_preproc, axis=1)

【讨论】：

【解决方案2】：

试试这个代码 -

df['complaint'] = df.apply(lambda row: pre_process(row[1],row[0]), axis=1)['complaint']

【讨论】：

这段代码究竟如何根据产品过滤我的数据框？这里根本没有提到我的产品。

【解决方案3】：

试试这个 -

def pre_process_wrapper(x):
    complaint = x['complaint']
    if complaint=="Mortgage":
        complaint = pre_process(complaint, Student_stopwords)
    elif:
        ...

    return complaint
df['complaint'] =df.apply(pre_process, axis=1)

我编写了一个额外的包装函数，它调用您的pre_process 函数并返回预处理后的complaint。您基本上可以将其用于数据帧级别的 apply 函数。

【讨论】：