【问题标题】:Fine tuning last x layers of BERT微调BERT的最后x层
【发布时间】:2019-05-07 18:21:47
【问题描述】:

我正在尝试仅在特定的最后一层(比如说最后 3 层)上微调 BERT。我想使用 Google Colab 进行 TPU 培训。我正在使用 hub.Module 加载 BERT 并对其进行微调,然后将微调后的输出用于我的分类任务。

bert_module = hub.Module(BERT_MODEL_HUB, tags=tags, trainable=True)

hub.Module 可以选择将模型设置为可训练或不可训练,但不能部分训练(仅特定层)

有人知道我如何使用hub.Module 训练最后​​ 1、2 或 3 层 BERT?

谢谢

【问题讨论】:

    标签: tensorflow module google-colaboratory embedding


    【解决方案1】:

    您可以在可训练变量列表中手动设置。下面是我在tensorflow-keras中实现的Bert层-

    class BertLayer(tf.layers.Layer):
     def __init__(
        self,
        n_fine_tune_layers=10,
        pooling="first",
        bert_path="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1",
        **kwargs,
    ):
        self.n_fine_tune_layers = n_fine_tune_layers
        self.trainable = True
        self.output_size = 768
        self.pooling = pooling
        self.bert_path = bert_path
        if self.pooling not in ["first", "mean"]:
            raise NameError(
                f"Undefined pooling type (must be either first or mean, but is {self.pooling}"
            )
    
        super(BertLayer, self).__init__(**kwargs)
    
     def build(self, input_shape):
        self.bert = hub.Module(
            self.bert_path, trainable=self.trainable, name=f"{self.name}_module"
        )
    
        # Remove unused layers
        trainable_vars = self.bert.variables
        if self.pooling == "first":
            trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]
            trainable_layers = ["pooler/dense"]
    
        elif self.pooling == "mean":
            trainable_vars = [
                var
                for var in trainable_vars
                if not "/cls/" in var.name and not "/pooler/" in var.name
            ]
            trainable_layers = []
        else:
            raise NameError(
                f"Undefined pooling type (must be either first or mean, but is {self.pooling}"
            )
    
        # Select how many layers to fine tune
        for i in range(self.n_fine_tune_layers):
            trainable_layers.append(f"encoder/layer_{str(11 - i)}")
    
        # Update trainable vars to contain only the specified layers
        trainable_vars = [
            var
            for var in trainable_vars
            if any([l in var.name for l in trainable_layers])
        ]
    
        # Add to trainable weights
        for var in trainable_vars:
            self._trainable_weights.append(var)
    
        for var in self.bert.variables:
            if var not in self._trainable_weights:
                self._non_trainable_weights.append(var)
    
        super(BertLayer, self).build(input_shape)
    
      def call(self, inputs):
        inputs = [K.cast(x, dtype="int32") for x in inputs]
        input_ids, input_mask, segment_ids = inputs
        bert_inputs = dict(
            input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids
        )
        if self.pooling == "first":
            pooled = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
                "pooled_output"
            ]
        elif self.pooling == "mean":
            result = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
                "sequence_output"
            ]
    
            mul_mask = lambda x, m: x * tf.expand_dims(m, axis=-1)
            masked_reduce_mean = lambda x, m: tf.reduce_sum(mul_mask(x, m), axis=1) / (
                    tf.reduce_sum(m, axis=1, keepdims=True) + 1e-10)
            input_mask = tf.cast(input_mask, tf.float32)
            pooled = masked_reduce_mean(result, input_mask)
        else:
            raise NameError(f"Undefined pooling type (must be either first or mean, but is {self.pooling}")
    
        return pooled
    
    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_size)
    

    关注上面代码中的下面一行-

    trainable_layers.append(f"encoder/layer_{str(11 - i)}")
    

    您可以将 n_fine_tune_layers 参数默认设置为 1/2/3,或者在声明层时传递它 -

    def __init__(self, n_fine_tune_layers=2, **kwargs):
    

    【讨论】:

    • -1 这个答案不正确。这只是使最后 N 个变量可训练,但 1)每个 BERT 层由多个变量组成,并且 2)这些层是按字典顺序排序的。 (旁注:请注明代码来源)
    【解决方案2】:

    以下代码仅取自这篇文章 (https://towardsdatascience.com/bert-in-keras-with-tensorflow-hub-76bcbc9417b),不正确。

    trainable_vars = self.bert.variables
    
    trainable_vars = trainable_vars[-self.n_fine_tune_layers:]
    

    将按字母顺序返回变量,而不是按实际层顺序。 因此,它将在第 4 层等之前返回第 11 层。这不是您想要的。

    我还没有弄清楚如何获得实现层的实际顺序,但当我这样做时会更新这个答案!

    【讨论】:

    【解决方案3】:

    blog post修改代码,我们可以选择正确的层。这也在链接到博客文章的 repo 中得到了解决,尽管以一种性能较低的方式。

    link to pull request

    def build(self, input_shape):
        self.bert = hub.Module(
            bert_path,
            trainable=self.trainable,
            name="{}_module".format(self.name)
        )
    
        trainable_vars = self.bert.variables
    
        # Remove unused layers
        trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]
    
        # ===========Replace incorrect line with:====================
        # Select how many layers to fine tune. note: this is wrong in the original code
        import re
        def layer_number(var):
            '''Get which layer a variable is in'''
            m = re.search(r'/layer_(\d+)/', var.name)
            if m:
                return int(m.group(1))
            else:
                return None
    
        layer_numbers = list(map(layer_number, trainable_vars))
        n_layers = max(n for n in layer_numbers if n is not None) + 1 # layers are zero-indexed
        trainable_vars = [var for n, var in zip(layer_numbers, trainable_vars) 
                          if n is not None and n >= n_layers - self.n_fine_tune_layers]
    
        # ========== Until here ====================
    
        # Add to trainable weights
        self._trainable_weights.extend(trainable_vars)
    
        # Add non-trainable weights
        for var in self.bert.variables:
            if var not in self._trainable_weights:
                self._non_trainable_weights.append(var)
    
        super(BertLayer, self).build(input_shape)
    

    【讨论】:

      猜你喜欢
      • 2020-06-10
      • 2017-04-30
      • 2020-05-18
      • 2021-01-16
      • 1970-01-01
      • 1970-01-01
      • 2020-07-01
      • 2021-07-22
      • 2020-05-23
      相关资源
      最近更新 更多