在 df_python 的特定行上应用函数答案

【问题标题】：Apply function on specific rows of a df_python在 df_python 的特定行上应用函数
【发布时间】：2021-09-02 05:23:21
【问题描述】：

我正在尝试应用以下两个函数，但仅适用于我的 df 的特定行。例如，仅当 column1 中的值填充有字符串“这是一个句子”时，该函数才应运行。在这种情况下，如果找到 column1 上的值，则函数应针对其他两列运行，但仅在 column1 中找到该值的对应行上运行。就像这个例子一样

column1                   column2                 column3         output_required
this is a sentence        phrase1                 SENT            SENT phrase1
this is a brand           phrase2                 10              10
no value available        phrase3                 apple           apple
this is a sentence        phrase4                 Fruits          Fruits phrase4

我的职能是：

    #concatenate with value from column 3
col_target = 'column3'
col_restrict = 'column2'

target = df[col_target].tolist()  # this generates a list of lists
restrict = df[col_restrict].tolist()  # this also generates a list of lists


def parse_str_into_list(s):
    if str(s).startswith('[') and str(s).endswith(']'):
        return ' '.join(str(s).strip('[]').strip("'").split("', '"))
    return str(s)
def filter_restrict_words(row):
    targets = parse_str_into_list(row[0]).split(' ', -1)
    restricts = parse_str_into_list(row[1]).split(' ', -1)
    words_to_keep = []
    for word in targets:
        # condition to keep eligible words
        if word not in words_to_keep:
            # if word not in restricts and word not in words_to_keep:
            words_to_keep.append(word)
            print('words to keep')
            print(words_to_keep)


    return ' '.join(words_to_keep)

df['Title'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)
df['output_required'] = df['Title'].map(str) + ' ' + df['column3'].map(str)


#and remove duplicates keeping the first occurrence
def uniqueList(row):
    words = row.split(" ")
    unique = words[0]
    for w in words:
        if w.lower() not in unique.lower():
            unique = unique + " " + w
    return unique
df["Correction_Value"] = df["Correction_Value"].apply(uniqueList)

此时，这些函数正在我的 df 的所有行上运行。

【问题讨论】：

您的示例的预期输出是什么？
@99_m4n output_required 列来自上述示例

标签： python pandas loops iteration

【解决方案1】：

使用.mask。

df['output_required'] = df['column3'].mask(df['column1'] == 'this is a sentence', df['column3'] + ' ' + df['column2'])

【讨论】：

【解决方案2】：

如果我理解正确：

尝试通过 numpy 的 where() 方法：

#import numpy as np
c=df['column1'].str.contains('this is a sentence')
#checking if 'this is a sentence' exists in column1
df['output_required']=np.where(c,df['column3']+' '+df['column2'],df['column3'])

或

通过Series.where()方法：

c=df['column1'].str.contains('this is a sentence')
#checking if 'this is a sentence' exists in column1
df['output_required']=df['column3'].where(~c, df['column3'] + ' ' + df['column2'])

或

通过布尔掩码和loc 访问器：

c=df['column1'].str.contains('this is a sentence')
#checking if 'this is a sentence' exists in column1
df['output_required']=None
#creating column output_required
df.loc[c,'output_required']=df.loc[c,'column3']+' '+df.loc[c,'column2']
#getting and assigning the value of column3+' '+column2 where condition c satisfies
df.loc[~c,'output_required']=df.loc[~c,'column3']
#getting and assigning the value of column3 where condition c failed

df的输出：

    column1             column2     column3     output_required
0   this is a sentence  phrase1     SENT        SENT phrase1
1   this is a brand     phrase2     10          10
2   no value available  phrase3     apple       apple
3   this is a sentence  phrase4     Fruits      Fruits phrase4

【讨论】：