如何通过遍历行来预测数据框中的每一行？答案

【问题标题】：how can I predict for each row in the dataframe by iterating through the rows?如何通过遍历行来预测数据框中的每一行？
【发布时间】：2021-07-24 19:57:15
【问题描述】：

我构建了一个 BERT 模型，现在我有了一个块，可以很好地对文本列中的每一行进行逐一分类。 Pandas 数据框是这样的：

    text
0   working add oil
1   @KristianaNKOTB you're welcome
2   is going to bed, work in the morning boo but t...
3   @sparky_habbo - uni &amp; assignments happened...
4   Can't wait to have chinese food! Still disappo...

文本列中每一行的分类代码如下：

text = [df[0]]

pred_tokens = map(tokenizer.tokenize, text)
pred_tokens = map(lambda tok: ["[CLS]"] + tok + ["[SEP]"], pred_tokens)
pred_token_ids = list(map(tokenizer.convert_tokens_to_ids, pred_tokens))

pred_token_ids = map(lambda tids: tids +[0]*(data.max_seq_len-len(tids)),pred_token_ids)
pred_token_ids = np.array(list(pred_token_ids))

predictions = model.predict(pred_token_ids).argmax(axis=-1)

df = pd.DataFrame(predictions, columns = ['emotion'])
df

例如，如果我们要分类df.text[0]，所以'working add oil'，是1还是0，我使用这段代码，结果是这样的：

    emotion
0   1

但是现在我如何通过遍历行来预测数据框中的每一行？

【问题讨论】：

标签： python pandas loops iterator iteration

【解决方案1】：

下面的代码演示了可用于预测数据框中的文本并保存它的过程。

输入数据：

df=pd.DataFrame({"text":['working add oil',"@KristianaNKOTB you're welcome","is going to bed, work in the morning boo but t..."]})

定义一个函数。你可以根据你的程序调整它。您可以注释我的代码并取消注释您的代码。

import random
def predict_emotion(input_text):
    text = input_text
    
    ''' uncomment this and remove my return statement
    pred_tokens = map(tokenizer.tokenize, text)
    pred_tokens = map(lambda tok: ["[CLS]"] + tok + ["[SEP]"], pred_tokens)
    pred_token_ids = list(map(tokenizer.convert_tokens_to_ids, pred_tokens))

    pred_token_ids = map(lambda tids: tids +[0]*(data.max_seq_len-len(tids)),pred_token_ids)
    pred_token_ids = np.array(list(pred_token_ids))

    predictions = model.predict(pred_token_ids).argmax(axis=-1)
    return predictions
    '''
    return_int=random.randint(1,8)
    print(f"text:{input_text},emotion:{return_int}")
    return return_int

为每一行输入文本调用该函数。

df['emotion']=df['text'].apply(predict_emotion)

输出：

【讨论】：

emotion 必须为 1 或 0，这是一个分类任务。我仅在 3 个随机行上测试此代码，输出是这样的，有很多 1 和 0：text emotion 1 you're welcome [0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0] 5 Prepping for auditions this afternoon. From wh... [0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, ... 7 off to face my exam now [0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, ... 图片链接：drive.google.com/file/d/1Q8uB5vZyjEWEVfyIrRircY5S4jbVb781/…
这是我已经完成的步骤：drive.google.com/file/d/1Osh1rKnjLlZ60yHzwAMQgA7XS1YJ9EUJ/…
这是模型的问题，它有 14 个输出，所以肯定不是二元分类模型。这个答案是为了运行预测，它工作正常。请在模型上提出一个新问题。
我已发帖：stackoverflow.com/questions/67356712/…
谢谢，我也去看看。请接受这个答案。谢谢