BERT 预训练 - masked_lm_accuracy 始终为零答案

【问题标题】：BERT pre-training - masked_lm_accuracy is always zeroBERT 预训练 - masked_lm_accuracy 始终为零
【发布时间】：2022-12-10 20:04:53
【问题描述】：

我正在尝试使用官方 tensorflow github repository 在特定领域的数据集上从头开始训练 BERT

我使用了文档的 this 部分来使脚本适应我的用例，但我遇到了问题。首先，我使用 create_pretraining_data.py 脚本将 .txt 文件处理为 .tfrecord。这里一切顺利，但是当我运行开始训练 BERT 模型的 train.py 脚本时，next_sentence_accuracy 在一些步骤后增加，但 masked_lm_accuracy 始终保持为 0。

这是给train.py脚本的config.yaml文件：

task:
  init_checkpoint: ''
  model:
    cls_heads: [{activation: tanh, cls_token_idx: 0, dropout_rate: 0.1, inner_dim: 768, name: next_sentence, num_classes: 2}]
    encoder:
      type: bert
      bert:
        attention_dropout_rate: 0.1
        dropout_rate: 0.1
        hidden_activation: gelu
        hidden_size: 768
        initializer_range: 0.02
        intermediate_size: 3072
        max_position_embeddings: 512
        num_attention_heads: 12
        num_layers: 12
        type_vocab_size: 2
        vocab_size: 50000
  train_data:
    drop_remainder: true
    global_batch_size: 32
    input_path: 'test_clean_tfrecord/2014/*'
    is_training: true
    max_predictions_per_seq: 20
    seq_length: 128
    use_next_sentence_label: true
    use_position_id: false
    use_v2_feature_names: false
  validation_data:
    drop_remainder: false
    global_batch_size: 32
    input_path: 'test_clean_tfrecord/2014/*'
    is_training: false
    max_predictions_per_seq: 20
    seq_length: 128
    use_next_sentence_label: true
    use_position_id: false
    use_v2_feature_names: false
trainer:
  checkpoint_interval: 5
  max_to_keep: 5
  optimizer_config:
    learning_rate:
      polynomial:
        cycle: false
        decay_steps: 1000000
        end_learning_rate: 0.0
        initial_learning_rate: 0.0001
        power: 1.0
      type: polynomial
    optimizer:
      type: adamw
    warmup:
      polynomial:
        power: 1
        warmup_steps: 10000
      type: polynomial
  steps_per_loop: 1
  summary_interval: 1
  train_steps: 200
  validation_interval: 5
  validation_steps: 64

这是 train.py 经过 5 个训练步骤后的输出：

2022-12-10 13:21:48.184678: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
C:\Users\Iulian\AppData\Roaming\Python\Python39\site-packages\keras\engine\functional.py:637:
UserWarning: Input dict contained keys ['masked_lm_positions',
'masked_lm_ids', 'masked_lm_weights', 'next_sentence_labels']
which did not match any model input. They will be ignored by the model.
  inputs = self._flatten_to_reference_inputs(inputs)
WARNING:tensorflow:Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
W1210 13:21:52.408583 13512 utils.py:82] Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
W1210 13:21:58.768023 19348 utils.py:82] Gradients do not exist for variables ['pooler_transform/kernel:0', 'pooler_transform/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
train | step:      2 | steps/sec:    0.0 | output:
    {'learning_rate': 1.9799998e-08,
     'lm_example_loss': 10.961581,
     'masked_lm_accuracy': 0.0,
     'next_sentence_accuracy': 0.5625,
     'next_sentence_loss': 0.73979986,
     'training_loss': 11.701381}
train | step:      3 | steps/sec:    0.0 | output:
    {'learning_rate': 2.97e-08,
     'lm_example_loss': 10.981846,
     'masked_lm_accuracy': 0.0,
     'next_sentence_accuracy': 0.5,
     'next_sentence_loss': 0.75065744,
     'training_loss': 11.732503}
train | step:      4 | steps/sec:    0.0 | output:
    {'learning_rate': 3.9599996e-08,
     'lm_example_loss': 10.988701,
     'masked_lm_accuracy': 0.0,
     'next_sentence_accuracy': 0.5625,
     'next_sentence_loss': 0.69400764,
     'training_loss': 11.682709}
train | step:      5 | steps/sec:    0.0 | output:
    {'learning_rate': 4.9500002e-08,
     'lm_example_loss': 11.004994,
     'masked_lm_accuracy': 0.0,
     'next_sentence_accuracy': 0.75,
     'next_sentence_loss': 0.5528765,
     'training_loss': 11.557871}

我试图查看源代码以查找 masked_lm_accuracy 在哪里使用（我认为需要一个特殊标志才能使用它）并且我发现默认情况下在模型的指标列表中添加了此准确性：

  def build_metrics(self, training=None):
    del training
    metrics = [
        tf.keras.metrics.SparseCategoricalAccuracy(name='masked_lm_accuracy'),
        tf.keras.metrics.Mean(name='lm_example_loss')
    ]
    # TODO(hongkuny): rethink how to manage metrics creation with heads.
    if self.task_config.train_data.use_next_sentence_label:
      metrics.append(
          tf.keras.metrics.SparseCategoricalAccuracy(
              name='next_sentence_accuracy'))
      metrics.append(tf.keras.metrics.Mean(name='next_sentence_loss'))
    return metrics

  def process_metrics(self, metrics, labels, model_outputs):
    with tf.name_scope('MaskedLMTask/process_metrics'):
      metrics = dict([(metric.name, metric) for metric in metrics])
      if 'masked_lm_accuracy' in metrics:
        metrics['masked_lm_accuracy'].update_state(
            labels['masked_lm_ids'], model_outputs['mlm_logits'],
            labels['masked_lm_weights'])
      if 'next_sentence_accuracy' in metrics:
        metrics['next_sentence_accuracy'].update_state(
            labels['next_sentence_labels'], model_outputs['next_sentence'])

【问题讨论】：

标签： tensorflow nlp bert-language-model pre-trained-model tensorflow-model-garden

【解决方案1】：

看起来您正在尝试使用 TensorFlow BERT 代码在特定领域的数据集上训练 BERT 模型。您遇到的问题是您的 masked_lm_accuracy 始终为 0，这表明模型未在学习。

发生这种情况有几个可能的原因。一个可能的原因是您的数据集不够大，无法支持从头开始训练 BERT 模型。 BERT 是一个庞大而复杂的模型，需要大量数据才能进行有效训练。如果您的数据集很小或种类不够多，模型可能无法有效学习。

另一个可能的原因是您的模型配置对于您的数据集不是最佳的。 BERT 模型是高度可配置的，不同的配置可能对不同的数据集效果更好。您使用的配置可能不适合您的域特定数据集。

最后，也有可能是代码或数据处理步骤有误。例如，如果您使用的 .tfrecord 文件不正确，模型可能无法从中学习。

要解决此问题，我建议执行以下步骤：

确保您拥有庞大且多样化的数据集。正如刚才提到的， BERT 需要大量数据才能进行有效训练。如果你的数据集是小或不多样化，模型可能无法学习。
尝试调整您的模型配置。你可以尝试不同的配置，看看它们是否改进了模型的表现。例如，您可以尝试增加层数、注意力头的数量或隐藏层的大小模型。
检查您的数据处理步骤。确保 .txt 文件是您正在使用的是正确的，并且正在使用 .tfrecord 文件正确生成。您也可以尝试使用不同的格式输入数据，例如 CSV 文件，以查看是否改进了模型的性能。
检查代码中的错误。如果您使用的是 TensorFlow BERT 代码，请确保您使用的是最新版本并且您正确遵循说明。如果您使用的是经过修改的代码的版本，确保您的修改不引入任何错误。

【讨论】：