【问题标题】:Bert model train don't want to stop伯特模型火车不想停下来
【发布时间】:2020-11-30 12:43:28
【问题描述】:

我正在使用此代码来训练 Bert 使用 2 个标签进行土耳其语模型分类。但是当我运行以下代码时:

import numpy as np
import pandas as pd

df = pd.read_excel (r'preparedDataNoId.xlsx')
df = df.sample(frac = 1)

from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(df, test_size=0.10)

print('train shape: ',train_df.shape)
print('test shape: ',test_df.shape)

train_df["text"]=train_df["text"].apply(lambda r: str(r))
train_df['label']=train_df['label'].astype(int)
from simpletransformers.classification import ClassificationModel

model = ClassificationModel('bert', 'dbmdz/bert-base-turkish-uncased', use_cuda=False,num_labels=2,
                            args={'reprocess_input_data': True, 'overwrite_output_dir': True, 'num_train_epochs': 3, "train_batch_size": 64 , "fp16":False, "output_dir": "bert_model"})

model.train_model(train_df) 

需要很多时间,它不会停止并且屏幕一直显示:

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

    if __name__ == '__main__':
        freeze_support()
        ...

【问题讨论】:

    标签: python machine-learning bert-language-model


    【解决方案1】:

    正如错误提示,您应该使用 if __name__ == '__main__': 包装您的代码

    所以你的代码是:

    import numpy as np
    import pandas as pd
    
    if __name__ == '__main__':
        df = pd.read_excel(r'preparedDataNoId.xlsx')
        df = df.sample(frac=1)
    
        from sklearn.model_selection import train_test_split
    
        train_df, test_df = train_test_split(df, test_size=0.10)
    
        print('train shape: ', train_df.shape)
        print('test shape: ', test_df.shape)
    
        train_df["text"] = train_df["text"].apply(lambda r: str(r))
        train_df['label'] = train_df['label'].astype(int)
        from simpletransformers.classification import ClassificationModel
    
        model = ClassificationModel('bert', 'dbmdz/bert-base-turkish-uncased', use_cuda=False, num_labels=2,
                                    args={'reprocess_input_data': True, 'overwrite_output_dir': True, 'num_train_epochs': 3,
                                          "train_batch_size": 64, "fp16": False, "output_dir": "bert_model"})
    
        model.train_model(train_df)
    

    为什么会这样?

    在 Windows 上,子进程将导入(即执行)主模块 一开始。你需要插入一个if __name__ == '__main__': 守卫 避免递归创建子进程的主模块。

    引用自:https://stackoverflow.com/a/18205006/6025629

    【讨论】:

      猜你喜欢
      • 2022-01-20
      • 2021-05-23
      • 1970-01-01
      • 2022-12-20
      • 2018-03-25
      • 1970-01-01
      • 2020-04-07
      • 2018-07-10
      相关资源
      最近更新 更多