在 pandas 数据帧上使用 lambda、apply 和 join 函数答案

【问题标题】：Use lambda, apply, and join function on a pandas dataframe在 pandas 数据帧上使用 lambda、apply 和 join 函数
【发布时间】：2019-11-23 03:36:08
【问题描述】：

目标

将deid_notes函数应用到df

背景

我有一个类似于此示例df 的df

import pandas as pd
df = pd.DataFrame({'Text' : ['there are many different types of crayons', 
                                   'i like a lot of sports cares', 
                                   'the middle east has many camels '], 

                      'P_ID': [1,2,3], 
                      'Word' : ['crayons', 'cars', 'camels'],
                      'P_Name' : ['John', 'Mary', 'Jacob'],
                      'N_ID' : ['A1', 'A2', 'A3']

                     })

#rearrange columns
df = df[['Text','N_ID', 'P_ID', 'P_Name', 'Word']]
df

    Text                  N_ID P_ID P_Name  Word
0   many types of crayons   A1  1    John   crayons
1   i like sports cars      A2  2    Mary   cars
2   has many camels         A3  3    Jacob  camels

我使用以下函数使用 NeuroNER http://neuroner.com/ 对 Text 列中的某些单词进行去标识化

def deid_notes(text):

    #use predict function from neuorNER to tag words to be deidentified 
    ner_list = n1.predict(text)      

    #n1.predict wont work in this toy example because neuroNER package needs to be installed (and installation is difficult) 
    #but the output resembles this: [{'start': 1, 'end:' 11, 'id': 1, 'tagged word': crayon}]

    #use start and end position of tagged words to deidentify and replace with **BLOCK**
    if len(ner_list) > 0:
        parts_to_take = [(0, ner_list[0]['start'])] + [(first["end"]+1, second["start"]) for first, second in zip(ner_list, ner_list[1:])] + [(ner_list[-1]['end'], len(text)-1)] 
        parts = [text[start:end] for start, end in parts_to_take] 
        deid = '**BLOCK**'.join(parts)

    #if n1.predict does not identify any words to be deidentified, place NaN 
    else:
        deid='NaN'

    return pd.Series(deid, index='Deid')

问题

我使用以下代码将deid_notes 函数应用于我的df

fx = lambda x: deid_notes(x.Text,axis=1)
df.join(df.apply(fx))

但我收到以下错误

AttributeError: ("'Series' object has no attribute 'Text'", 'occurred at index Text')

问题

如何让deid_notes 函数在我的df 上工作？

【问题讨论】：

在这种情况下n1 是什么？
n1=neuromodel.NeuroNER(train_model=False, use_pretrained_model=True, dataset_text_folder="./data/example_unannotated_texts", pretrained_model_folder="./trained_models/mimic_glove_stanford_bioes")
试试df.join(df.apply(fx, axis=1))
我收到一个错误TypeError: ("deid_notes() got an unexpected keyword argument 'axis'", 'occurred at index 0')

标签： python-3.x pandas join lambda apply

【解决方案1】：

假设您要返回一个 pandas 系列作为 deid_notes 函数的输出，该函数将 text 作为唯一的输入参数。将 axis = 1 参数传递给 apply 而不是 died_notes。例如。

# Dummy function
def deid_notes(text):
    deid = 'prediction to: ' + text
    return pd.Series(deid, index = ['Deid'])

fx = lambda x: deid_notes(x.Text)
df.join(df.apply(fx, axis =1))

【讨论】：