【发布时间】:2020-04-23 08:16:09
【问题描述】:
我是 AI 和 TensorFlow 的新手,我正在尝试在 Windows 上使用 TensorFlow 对象检测 API。
我目前的目标是在视频流中进行实时人体检测。
为此,我修改了 TensorFlow 模型花园 (https://github.com/tensorflow/models) 中的一个 python 示例。
目前它检测所有对象(不仅仅是人类)并使用 opencv 显示边界框。
当我禁用 GPU 时它工作正常 (os.environ["CUDA_VISIBLE_DEVICES"] = "-1")
但是当我启用 GPU 并启动脚本时,它会挂在第一帧上。
输出:
2020-04-22 16:00:53.597492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:56.942141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-22 16:00:56.976635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-04-22 16:00:56.989129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:57.000622: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:00:57.012247: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-22 16:00:57.020575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-22 16:00:57.031536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-22 16:00:57.042564: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-22 16:00:57.066289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:00:57.075760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-04-22 16:00:59.239211: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-22 16:00:59.256577: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f3f73cd670 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-22 16:00:59.264241: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-04-22 16:00:59.272280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-04-22 16:00:59.281409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:59.288204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:00:59.293112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-22 16:00:59.298222: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-22 16:00:59.305446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-22 16:00:59.310590: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-22 16:00:59.316250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:00:59.324588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-04-22 16:01:00.831569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-22 16:01:00.839147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-04-22 16:01:00.842279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-04-22 16:01:00.846140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1024 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-04-22 16:01:00.865546: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f39174cba0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-22 16:01:00.873656: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
2020-04-22 16:01:10.876733: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:01:11.814909: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-04-22 16:01:11.852909: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:01:12.149312: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.179484: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.209036: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.237205: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.266147: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.295182: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.325645: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.357550: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.405332: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.436336: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
这是我正在使用的代码:
#!/usr/bin/env python
# coding: utf-8
import os
import pathlib
if "models" in pathlib.Path.cwd().parts:
while "models" in pathlib.Path.cwd().parts:
os.chdir('..')
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from PIL import Image
from IPython.display import display
import cv2
cap = cv2.VideoCapture(1)
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
# patch tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1
# Patch the location of gfile
tf.gfile = tf.io.gfile
# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
def load_model(model_name):
base_url = 'http://download.tensorflow.org/models/object_detection/'
model_file = model_name + '.tar.gz'
model_dir = tf.keras.utils.get_file(
fname=model_name,
origin=base_url + model_file,
untar=True)
model_dir = pathlib.Path(model_dir)/"saved_model"
model = tf.saved_model.load(str(model_dir))
model = model.signatures['serving_default']
return model
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
# model_name= 'faster_rcnn_inception_v2_coco_2017_11_08';
detection_model = load_model(model_name)
print(detection_model.inputs)
detection_model.output_dtypes
detection_model.output_shapes
def run_inference_for_single_image(model, image):
image = np.asarray(image)
# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis,...]
# Run inference (it hangs here)
output_dict = model(input_tensor)
# All outputs are batches tensors.
# Convert to numpy arrays, and take index [0] to remove the batch dimension.
# We're only interested in the first num_detections.
num_detections = int(output_dict.pop('num_detections'))
output_dict = {key:value[0, :num_detections].numpy()
for key,value in output_dict.items()}
output_dict['num_detections'] = num_detections
# detection_classes should be ints.
output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
# Handle models with masks:
if 'detection_masks' in output_dict:
# Reframe the the bbox mask to the image size.
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(output_dict['detection_masks'], output_dict['detection_boxes'],image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,tf.uint8)
output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
return output_dict
def show_inference(model):
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
ret, image_np = cap.read()
#percent by which the image is resized
#scale_percent = 30
#calculate the 50 percent of original dimensions
#width = int(image_np.shape[1] * scale_percent / 100)
#height = int(image_np.shape[0] * scale_percent / 100)
# dsize
#dsize = (width, height)
# resize image
#image_np = cv2.resize(image_np, dsize)
# Actual detection.
output_dict = run_inference_for_single_image(model, image_np)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed', None),
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow('object detection', cv2.resize(image_np, (800,600)))
while True:
show_inference(detection_model)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
我安装了以下版本:
Python:3.7 64 位
张量流:2.2.0-rc3
库达:10.1
cudnn 7.6.5.32
我在两台不同的机器上试过这个:
机器 1:
- CPU:i7-6700HQ
- 内存:16 GB
- GPU:NVIDIA GeForce GTX 960M
机器 2:
- CPU:i5-6400
- 内存:16 GB
- GPU:NVIDIA GeForce GTX 960
我不确定如何调试它。我在两台不同的机器上尝试了相同的代码,结果几乎相同。
唯一的区别是它挂起的时间。机器 1 会立即挂起,机器 2 大约需要 30 秒。
机器 2 能够处理视频并检测对象直到挂起。
我查看了“分配器 (GPU_0_bfc) 内存不足”警告。
我尝试了一些限制可用 GPU 内存大小的选项,但这没有帮助。
还有多个帖子建议减少批量大小。
我的解释是,这仅在训练您自己的模型时才有用。
而且因为我使用的是预训练模型,所以这不适用。
我还尝试使用不同的模型:ssd_mobilenet_v1_coco_2017_11_17 和 faster_rcnn_inception_v2_coco_2017_11_08。两种模型的结果相同。
我尝试的最后一件事是在处理之前减小图像大小。这也没有帮助。
任何帮助将不胜感激
更新
我还在 RTX2070 超级 GPU 上进行了尝试。没有关于内存分配的警告。这也无法完成单一的推理。
为了完整起见,这是控制台输出 [在运行推理之前打印文本“推理开始”。如果推理完成,它将打印“推理结束”]:
2020-04-24 11:30:16.579805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.916146: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-24 11:30:18.941805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-24 11:30:18.946134: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.951172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-24 11:30:18.954809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-24 11:30:18.957258: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-24 11:30:18.961662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-24 11:30:18.965553: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-24 11:30:18.978671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:18.980998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-24 11:30:18.982226: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-24 11:30:18.984167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-24 11:30:18.987291: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.988809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-24 11:30:18.990303: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-24 11:30:18.991792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-24 11:30:18.993320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-24 11:30:18.996960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-24 11:30:18.998497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:19.000191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-24 11:30:19.430864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-24 11:30:19.433076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-04-24 11:30:19.434566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-04-24 11:30:19.436400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6281 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
inference start
2020-04-24 11:30:24.728554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:25.608426: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-24 11:30:25.625904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
更新 2
当 Eager 模式被禁用时,一切都运行良好(即使在 GPU 上),但是我无法检索找到的对象。
我尝试的下一件事是使用会话运行它(我认为像 TensorFlow 1)。这里的 session.run() 函数在 GPU 上无限期地阻塞。再次在 CPU 上运行良好。
【问题讨论】:
-
您好,只是一个建议。也许您可以在设备设置为 GPU 的情况下在 Google Colab 上尝试您的代码(并且可能将一张图片上传到 Colab)。它们提供对至少具有 12 GB 内存的 K40 的免费访问,这应该足以进行推理。如果有效,您就知道这是内存问题。
-
您好,感谢您的回复。我会努力让它在那里工作。
标签: python tensorflow object-detection-api