【发布时间】:2020-03-08 15:59:18
【问题描述】:
我正在尝试在同一台机器的多个 GPU 上运行 Albert Tensorflow 集线器版本。该模型在单个 GPU 上完美运行。
这是我的代码结构:
strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync)) # it prints 2 .. correct
if __name__ == "__main__":
with strategy.scope():
run()
在run() 函数中,我读取数据、构建模型并拟合它。
我收到此错误:
Traceback (most recent call last):
File "Albert.py", line 130, in <module>
run()
File "Albert.py", line 88, in run
model = build_model(bert_max_seq_length)
File "Albert.py", line 55, in build_model
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
File "/home/****/py_transformers/lib/python3.5/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/home/bighanem/py_transformers/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/training.py", line 471, in compile
' model.compile(...)'% (v, strategy))
ValueError: Variable (<tf.Variable 'bert/embeddings/word_embeddings:0' shape=(30000, 128) dtype=float32>) was not created in the distribution strategy scope of (<tensorflow.python.distribute.mirrored_strategy.MirroredStrategy object at 0x7f62e399df60>). It is most likely due to not all layers or the model or optimizer being created outside the distribution strategy scope. Try to make sure your code looks similar to the following.
with strategy.scope():
model=_create_model()
model.compile(...)
有没有可能是因为Albert模型是tensorflow团队之前准备好的(构建和编译的)?
已编辑:
确切地说,Tensorflow 版本是2.1。
另外,这是我加载 Albert 预训练模型的方式:
features = {"input_ids": in_id, "input_mask": in_mask, "segment_ids": in_segment, }
albert = hub.KerasLayer(
"https://tfhub.dev/google/albert_xxlarge/3",
trainable=False, signature="tokens", output_key="pooled_output",
)
x = albert(features)
【问题讨论】:
标签: tensorflow tf.keras multi-gpu pre-trained-model tensorflow-hub