【问题标题】:Process finished with exit code -1073741571 (0xC00000FD) Tensorflow过程以退出代码 -1073741571 (0xC00000FD) Tensorflow 完成
【发布时间】:2021-01-16 00:08:57
【问题描述】:

我知道这个问题被问了很多,但就我而言,它有点奇怪。我刚买了一台 RTX 3080,并尝试根据我在 reddit 上找到的教程安装 Tensorflow。我按照那里的描述做了所有事情: 安装 Anaconda --> Python 3.8 --> TF-nightly v. 2.5.0 --> Visual Studio C++ --> Cuda 11.1.0 --> cuDNN 8.0.4 --> 添加路径 --> 重启电脑。起初一切似乎都有效。我尝试了以下命令:

import tensorflow as tf
tf.config.list_physical_devices()

正如您在输出中看到的那样,这可以正常工作:

C:\Users\loose\.conda\envs\tf2\python.exe C:/Users/loose/PycharmProjects/GenerateAutomatedEMail/python/test.py
2021-01-16 00:40:45.043205: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:40:46.676446: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-16 00:40:46.699117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:40:46.699285: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:40:46.713523: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:40:46.713626: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:40:46.717017: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-16 00:40:46.718013: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-16 00:40:46.725508: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-16 00:40:46.728010: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-16 00:40:46.728534: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:40:46.728660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0

Process finished with exit code 0

我目前尝试从TF tutorials 训练 Seq2Seq 模型。代码几乎完全相同,但我使用 PyCharm 而不是 Jupyter,我将所有内容都放在一个类中,但代码本身是相同的。我的完整代码可在GitHub 中找到。当我想训练模型时,我收到错误 “进程完成,退出代码 -1073741571 (0xC00000FD)”。但是没有真正的错误显示程序只是以这个退出代码结束:

C:\Users\loose\.conda\envs\tf2\python.exe C:/Users/loose/PycharmProjects/GenerateAutomatedEMail/python/train_model.py
2021-01-16 00:50:34.337791: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:50:36.873698: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-16 00:50:36.894834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:50:36.895004: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:50:36.909453: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:50:36.909542: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:50:36.912954: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-16 00:50:36.914024: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-16 00:50:36.921476: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-16 00:50:36.924059: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-16 00:50:36.924660: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:50:36.924807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-01-16 00:50:36.925280: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-16 00:50:36.926213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:50:36.926418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-01-16 00:50:37.388811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-16 00:50:37.388901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306]      0 
2021-01-16 00:50:37.388947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1319] 0:   N 
2021-01-16 00:50:37.389134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1446] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7447 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:2d:00.0, compute capability: 8.6)
2021-01-16 00:50:38.006971: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:50:38.586194: I tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Loaded cuDNN version 8004
2021-01-16 00:50:38.709516: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:50:39.312210: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:50:39.313013: I tensorflow/stream_executor/cuda/cuda_bl

as.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

Process finished with exit code -1073741571 (0xC00000FD)

所以我试图在程序崩溃时找到该行。我发现它在初始化“BahdanauAttention”类后立即崩溃,如picture 所示。

经过几个小时的测试,我可以假设/确认几件事:

  • 我可以在这个 venv 中运行正常(非 tensorflow)代码而不会出现此错误
  • 我没有用完内存(最多只有 17GB 的 32GB 内存在使用)
  • 我没有打开任何可能导致冲突的程序(例如 NVIDIA Broadcast 或 Jupyter Lab 等)

我为解决此问题而进行的测试:

  • 重新安装 Conda
  • 创建新的venv
  • 重新安装 TF 以及所有 NVIVIDA 驱动程序
  • 尝试不同的 Python 版本(3.7 而不是 3.8)
  • 重启我的电脑

在这一点上,我有点无能为力。有谁知道如何解决这个问题?

【问题讨论】:

  • 你能尝试不同的 Tensorflow 稳定版本,目前是 2.4。
  • 不,很遗憾,这是不可能的,因为 RTX 30 系列无法与任何稳定版本一起使用

标签: python tensorflow pycharm anaconda


【解决方案1】:

您可以将Tensorflow 升级到最新的稳定版本,因为Tensorflow 2.4 版本支持RTX 30 系列的新Nvidia's Ampere 架构,并且还提供CUDA 11 支持。
您可以在此图表中查看详细信息并按照指南进行安装。
https://www.tensorflow.org/install/source_windows#tested_build_configurations

关于 GPU 上的内存使用,您始终可以在代码开头设置内存增长,如提到的 here

【讨论】:

    猜你喜欢
    • 2020-10-14
    • 2022-11-17
    • 2023-01-17
    • 2021-08-18
    • 1970-01-01
    • 2014-01-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多