【问题标题】:Apply function on specific rows of a df_python在 df_python 的特定行上应用函数
【发布时间】:2021-09-02 05:23:21
【问题描述】:

我正在尝试应用以下两个函数,但仅适用于我的 df 的特定行。 例如,仅当 column1 中的值填充有字符串“这是一个句子”时,该函数才应运行。在这种情况下,如果找到 column1 上的值,则函数应针对其他两列运行,但仅在 column1 中找到该值的对应行上运行。 就像这个例子一样

column1                   column2                 column3         output_required
this is a sentence        phrase1                 SENT            SENT phrase1
this is a brand           phrase2                 10              10
no value available        phrase3                 apple           apple
this is a sentence        phrase4                 Fruits          Fruits phrase4

我的职能是:

    #concatenate with value from column 3
col_target = 'column3'
col_restrict = 'column2'

target = df[col_target].tolist()  # this generates a list of lists
restrict = df[col_restrict].tolist()  # this also generates a list of lists


def parse_str_into_list(s):
    if str(s).startswith('[') and str(s).endswith(']'):
        return ' '.join(str(s).strip('[]').strip("'").split("', '"))
    return str(s)
def filter_restrict_words(row):
    targets = parse_str_into_list(row[0]).split(' ', -1)
    restricts = parse_str_into_list(row[1]).split(' ', -1)
    words_to_keep = []
    for word in targets:
        # condition to keep eligible words
        if word not in words_to_keep:
            # if word not in restricts and word not in words_to_keep:
            words_to_keep.append(word)
            print('words to keep')
            print(words_to_keep)


    return ' '.join(words_to_keep)

df['Title'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)
df['output_required'] = df['Title'].map(str) + ' ' + df['column3'].map(str)


#and remove duplicates keeping the first occurrence
def uniqueList(row):
    words = row.split(" ")
    unique = words[0]
    for w in words:
        if w.lower() not in unique.lower():
            unique = unique + " " + w
    return unique
df["Correction_Value"] = df["Correction_Value"].apply(uniqueList)

此时,这些函数正在我的 df 的所有行上运行。

【问题讨论】:

  • 您的示例的预期输出是什么?
  • @99_m4n output_required 列来自上述示例

标签: python pandas loops iteration


【解决方案1】:

使用.mask

df['output_required'] = df['column3'].mask(df['column1'] == 'this is a sentence', df['column3'] + ' ' + df['column2'])

【讨论】:

    【解决方案2】:

    如果我理解正确:

    尝试通过 numpy 的 where() 方法:

    #import numpy as np
    c=df['column1'].str.contains('this is a sentence')
    #checking if 'this is a sentence' exists in column1
    df['output_required']=np.where(c,df['column3']+' '+df['column2'],df['column3'])
    

    通过Series.where()方法:

    c=df['column1'].str.contains('this is a sentence')
    #checking if 'this is a sentence' exists in column1
    df['output_required']=df['column3'].where(~c, df['column3'] + ' ' + df['column2'])
    

    通过布尔掩码和loc 访问器:

    c=df['column1'].str.contains('this is a sentence')
    #checking if 'this is a sentence' exists in column1
    df['output_required']=None
    #creating column output_required
    df.loc[c,'output_required']=df.loc[c,'column3']+' '+df.loc[c,'column2']
    #getting and assigning the value of column3+' '+column2 where condition c satisfies
    df.loc[~c,'output_required']=df.loc[~c,'column3']
    #getting and assigning the value of column3 where condition c failed
    

    df的输出:

        column1             column2     column3     output_required
    0   this is a sentence  phrase1     SENT        SENT phrase1
    1   this is a brand     phrase2     10          10
    2   no value available  phrase3     apple       apple
    3   this is a sentence  phrase4     Fruits      Fruits phrase4
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-12-28
      • 1970-01-01
      • 2018-07-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多