【问题标题】:model_main.py faster-rcnn CUDA_ERROR_OUT_OF_MEMORYmodel_main.py 更快-rcnn CUDA_ERROR_OUT_OF_MEMORY
【发布时间】:2020-06-29 03:01:30
【问题描述】:

说明:

我可以使用 legacy/train.py 训练 fast-rcnn 模型,但是当我尝试使用 model_main.py 以相同的配置设置进行训练时遇到如下问题。 图像分辨率:1920x1080

tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
.\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592

tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256):     Total Chunks: 4753, Chunks in use: 4753. 1.16MiB allocated for chunks. 1.16MiB in use in bin. 144.3KiB client-requested in use in bin.

tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0000000203800000 next 1 of size 256

我尝试过的:

  1. 将批量大小设置为 1
  2. 使用内存增长

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

会话 = tf.Session(config=config)

session_config = tf.ConfigProto()

session_config.gpu_options.allow_growth = True

config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, session_config=session_config, log_step_count_steps=10, save_summary_steps=20, keep_checkpoint_max=20, save_checkpoints_steps=100)

  1. 不要分配整个 GPU 内存

config = tf.ConfigProto()

config.gpu_options.per_process_gpu_memory_fraction = 0.6

会话 = tf.Session(config=config)

session_config = tf.ConfigProto()

session_config.gpu_options.per_process_gpu_memory_fraction = 0.6

config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, session_config=session_config, log_step_count_steps=10, save_summary_steps=20, keep_checkpoint_max=20, save_checkpoints_steps=100)

TensorFlow CUDA_ERROR_OUT_OF_MEMORY

  1. queue_capacity、min_after_dequeue、num_readers、batch_queue_capacity、num_batch_queue_threads、prefetch_queue_capacity的设置

Out Of Memory when training on Big Images

  1. 将 min_dimension、max_dimension 降低到 270、480

这些都不适合我。

环境:

  • 操作系统平台和发行版:Win 10 专业版:1909
  • TensorFlow 安装自:pip tensorflow-gpu
  • TensorFlow 1.14 版
  • 对象检测:0.1 CUDA/cuDNN 版本:Cuda 10.0、Cudnn 10.0
  • GPU 型号和内存:NVIDIA GeForce RTX 2070 SUPER,内存 8 G
  • 系统内存:32G

我的配置:

# Faster R-CNN with Inception v2, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 2
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 1080
        max_dimension: 1920
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_v2'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: ""
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  batch_queue_capacity: 60
  num_batch_queue_threads: 30
  prefetch_queue_capacity: 40
}


train_input_reader: {
  tf_record_input_reader {
    input_path: "D:\\object_detection\\train_data\\train.record"
  }
  label_map_path: "D:\\object_detection\\pascal_label_map.pbtxt"
  queue_capacity: 2
  min_after_dequeue: 1
  num_readers: 1
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 1101
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "D:\\object_detection\\eval_data\\eval.record"
  }
  label_map_path: "D:\\object_detection\\pascal_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

如果有其他解决方案,我将非常感谢您。

【问题讨论】:

    标签: tensorflow out-of-memory object-detection object-detection-api faster-rcnn


    【解决方案1】:

    对象检测模型会消耗大量内存。这是因为它们的工作方式以及它们为找到框而生成的大量锚点。

    您做得很好,但您的 GPU 不足以训练这类模型。 你可以做的事情:

    • 缩小图片尺寸,比如 720x512
    • 使用 SGD 作为优化器,而不是使用其他优化器,例如 Adam。 SGD 消耗的内存大约是 Adam 的 3 倍。

    另外值得一提的是,您在 1 个实例的小批量方面做得很好。如果我没记错的话,FasterRCNN 每批只训练 2 张图像

    【讨论】:

    • 感谢您的回复。我也怀疑这是因为 fast-rcnn 消耗大量内存,但我可以使用 legacy/train.py 训练模型(具有相同的配置设置,例如批量大小)。因此,我认为我的 gpu 应该可以训练这个模型。另一种可能性是因为 model_main.py 将同时运行训练和评估,因此它比 legacy/train.py 消耗更多的内存。但我必须深入研究代码来检查。忘了说我也试过把图片缩小到480x270,但是几步之后还是会遇到OOM。我稍后会尝试 SGD,干杯。
    【解决方案2】:

    我刚刚发现如果我将batch_size设置为3,那么它可以正常工作。当我将batch_size设置回1时,会遇到OOM问题。

    这很奇怪,我仍然不知道为什么,因为它应该总是以较小的批处理大小来节省内存。

    如果你遇到同样的情况,可以尝试稍微增加batch size,但我不能保证它会起作用。

    【讨论】:

      猜你喜欢
      • 2017-06-14
      • 1970-01-01
      • 2016-11-15
      • 2021-03-17
      • 2020-02-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-02-06
      相关资源
      最近更新 更多