在 Google Colab 中使用 TPU答案

【问题标题】：Use TPU in Google Colab在 Google Colab 中使用 TPU
【发布时间】：2021-02-17 21:46:52
【问题描述】：

我目前正在 TPU 的帮助下训练神经网络。我更改了运行时类型并初始化了 TPU。我感觉它仍然没有更快。我用https://www.tensorflow.org/guide/tpu。我有什么问题吗？

# TPU initialization
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
# This is the TPU initialization code that has to be at the beginning.
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))

.
.
.
# experimental_steps_per_execution = 50
model.compile(optimizer=Adam(lr=learning_rate), loss='binary_crossentropy', metrics=['accuracy'], experimental_steps_per_execution = 50)

我的模型总结

我还有什么需要考虑或调整的吗？

【问题讨论】：

标签： python tensorflow google-colaboratory tpu

【解决方案1】：

您需要创建 TPU 策略：

strategy = tf.distribute.TPUStrategy(resolver).

比正确使用这个策略：

with strategy.scope():
  model = create_model()
  model.compile(optimizer='adam',
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])

【讨论】：

非常感谢您的回答！以及如何创建 TPU 策略？可以给个代码 sn-p 吗？
你如何处理这个错误ResourceExhaustedError: 9 root error(s) found. (0) Resource exhausted: {{function_node __inference_train_function_14917}} Compilation failure: Ran out of memory in memory space hbm. Used 8.29G of 7.48G hbm. Exceeded hbm capacity by 825.64M.？
你的模型很大。尝试将 batch_size 减小到 8
抱歉给您添麻烦了。我尝试了 batch_size = 8。不幸的是，错误不断出现。
尝试 batch_size = 1