【问题标题】:Finetuning BERT on custom data在自定义数据上微调 BERT
【发布时间】:2019-09-22 14:11:20
【问题描述】:

我想使用 Bert 训练一个 21 类 文本分类模型。但我的训练数据很少,所以我下载了一个类似的数据集,其中包含 5 个类 和 200 万个样本。t 并使用 bert 提供的未加壳预训练模型微调下载的数据。 并获得了大约 98% 的验证准确率。 现在,我想将此模型用作我的小型自定义数据的预训练模型。 但是我收到shape mismatch with tensor output_bias from checkpoint reader 错误,因为检查点模型有 5 个类,而我的自定义数据有 21 个类。

NFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*** Features ***
INFO:tensorflow:  name = input_ids, shape = (32, 128)
INFO:tensorflow:  name = input_mask, shape = (32, 128)
INFO:tensorflow:  name = is_real_example, shape = (32,)
INFO:tensorflow:  name = label_ids, shape = (32, 21)
INFO:tensorflow:  name = segment_ids, shape = (32, 128)
Tensor("IteratorGetNext:3", shape=(32, 21), dtype=int32)
WARNING:tensorflow:From /home/user/Spine_NLP/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /home/user/Spine_NLP/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
INFO:tensorflow:num_labels:21;logits:Tensor("loss/BiasAdd:0", shape=(32, 21), dtype=float32);labels:Tensor("loss/Cast:0", shape=(32, 21), dtype=float32)
INFO:tensorflow:Error recorded from training_loop: Shape of variable output_bias:0 ((21,)) doesn't match with shape of tensor output_bias ([5]) from checkpoint reader.

【问题讨论】:

    标签: tensorflow deep-learning nlp text-classification bert-language-model


    【解决方案1】:

    如果您想使用具有 5 个类的预训练模型来微调您自己的模型,您可能需要添加多一层以将 5 个类投影到您的 21 个类中。 p>

    您看到的错误是由于您可能没有定义一组新的“output_weights”和“output_bias”,而是将它们重新用于具有 21 个类的新标签。下面我用“final_”为你的新标签“前缀”了中间张量。

    代码应该如下所示:

    # These are the logits for the 5 classes. Keep them as is.
    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    
    # You want to create one more layer
    final_output_weights = tf.get_variable(
      "final_output_weights", [21, 5],
      initializer=tf.truncated_normal_initializer(stddev=0.02))
    final_output_bias = tf.get_variable(
      "final_output_bias", [21], initializer=tf.zeros_initializer())
    
    final_logits = tf.matmul(logits, final_output_weights, transpose_b=True)
    final_logits = tf.nn.bias_add(final_logits, final_output_bias)
    
    # Below is for evaluating the classification.
    final_probabilities = tf.nn.softmax(final_logits, axis=-1)
    final_log_probs = tf.nn.log_softmax(final_logits, axis=-1)
    
    # Note labels below should be the 21 class ids.
    final_one_hot_labels = tf.one_hot(labels, depth=21, dtype=tf.float32)
    final_per_example_loss = -tf.reduce_sum(final_one_hot_labels * final_log_probs, axis=-1)
    final_loss = tf.reduce_mean(final_per_example_loss)
    

    【讨论】:

    • 我不应该从预训练模型中删除最后一层,而不是再添加一层吗?这就是迁移学习的工作原理,不是吗?
    猜你喜欢
    • 2020-10-10
    • 2020-09-09
    • 2020-06-10
    • 2018-09-15
    • 2017-11-01
    • 2021-01-16
    • 2019-09-22
    • 2022-01-16
    • 1970-01-01
    相关资源
    最近更新 更多