【问题标题】:How to using the pre-trained bert model as embedding layer in my model?如何在我的模型中使用预训练的 bert 模型作为嵌入层?
【发布时间】:2021-04-20 08:48:10
【问题描述】:

我在我的功能模型中使用了这个由 bert-for-tf2 预训练的模型,如下所示:

def build_model(model_dir, batch_size, max_seq_num, max_seq_len):
    bert_params = bert.params_from_pretrained_ckpt(model_dir)
    l_bert = bert.BertModelLayer.from_params(bert_params, name="bert", trainable=False)

    input_ids = tf.keras.layers.Input(shape=(max_seq_num, max_seq_len,), dtype='int32', name='input_ids')
    reshaped_input_ids = tf.reshape(input_ids, (batch_size * max_seq_num, max_seq_len))

    token_type_ids = tf.keras.layers.Input(shape=(max_seq_num, max_seq_len,), dtype='int32', name='token_type')
    reshaped_token_type_ids = tf.reshape(token_type_ids, (batch_size * max_seq_num, max_seq_len))

    mask_ids = tf.keras.layers.Input(shape=(max_seq_num, max_seq_len,), dtype='int32', name='mask_ids')
    reshaped_mask_ids = tf.reshape(mask_ids, (batch_size * max_seq_num, max_seq_len))

    # provide a custom token_type/segment id as a layer input
    bert_embedd = l_bert([reshaped_input_ids, reshaped_token_type_ids], mask=reshaped_mask_ids)  # [batch_size*max_seq_num, max_seq_len, hidden_size]
    model = tf.keras.models.Model(inputs=[input_ids, token_type_ids, mask_ids], outputs=bert_embedd)
    model.build(input_shape=[(batch_size, max_seq_num, max_seq_len), 
                                                  (batch_size, max_seq_num, max_seq_len),
                                                  (batch_size, max_seq_num, max_seq_len)])
    bert.load_bert_weights(l_bert, os.path.join(model_dir, "bert_model.ckpt"))  # should be called after model.build()
    model.summary()
    tf.keras.utils.plot_model(model, show_shapes=True)
    learning_rate = 1e-2
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate), 
                              loss=tf.keras.losses.MeanSquaredError(), metrics=['mse'])
    return model

我可以成功构建模型。但是当我将数据输入模型时:

model = build_model(path, 16, 16, 16)
x_input = np.random.randint(0, 10000, size=[16, 16, 16])
x_token_type = [[[i] * 16 for i in range(16)] for _ in range(16)]
x_mask = np.ones(shape=[16, 16, 16])
y_predict = model(x_input, x_token_type, x_mask)

出现错误:

ValueError: Layer model expects 2 input(s), but it received 1 input tensors. Inputs received: ...

【问题讨论】:

    标签: tensorflow bert-language-model


    【解决方案1】:

    在最后一条指令中,

    y_predict = model(x_input, x_token_type, x_mask)
    

    需要两个输入,但您只提供一个。

    我认为您有三个输入 -x_inputx_token_typex_mask。如果你想训练你的模型,试试这个:

    model.fit([x_input, x_token_type, x_mask])
    

    【讨论】:

      猜你喜欢
      • 2020-12-08
      • 1970-01-01
      • 1970-01-01
      • 2020-09-03
      • 1970-01-01
      • 2018-12-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多