【发布时间】:2017-12-13 21:49:41
【问题描述】:
我正在尝试使用 tensorflow 1.2 中的新对象检测 api 和示例 faster-rcnn 配置来训练自定义数据集。我得到的错误与一些张量形状有关,但它在训练过程中似乎是随机发生的,而且确切的形状也会发生变化。
INFO:tensorflow:global step 132: loss = 63.3741 (0.262 sec/step)
INFO:tensorflow:global step 133: loss = 33.7362 (0.292 sec/step)
INFO:tensorflow:global step 134: loss = 18.0165 (0.264 sec/step)
INFO:tensorflow:global step 135: loss = 40.5577 (0.266 sec/step)
INFO:tensorflow:global step 136: loss = 24.1086 (0.266 sec/step)
2017-07-10 10:23:49.066345: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
2017-07-10 10:23:49.066475: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
2017-07-10 10:23:49.066509: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
[[Node: gradients/FirstStageFeatureExtractor/resnet_v1_50/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/convolution_grad/tuple/control_dependency_1/_2621 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_13108_gradients/FirstStageFeatureExtractor/resnet_v1_50/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/convolution_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
如您所见,它正确地运行了可变数量的步骤,然后给了我Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]。我不明白为什么会触发此错误,以及不兼容形状的来源,因为这在运行之间也会发生变化。
当我将数据集转换为 TF 格式时,我不确定这是否是我的问题。但是,我已经成功地用他们的 ssd 实现在同一个数据集上训练了几天,所以我认为可以肯定地说数据格式正确。
编辑:标签映射文件是here。我想再次指出,同样的数据集使用 ssd 可以完美运行。
【问题讨论】:
-
我有时会在标签映射无效时看到此错误。你能把你的labelmap复制到问题中吗?
-
使用标签图更新
-
我面临同样的问题。数据集正在使用 SSD 进行训练,但在使用 f_rcnn 时在 136 个全局步骤后出现错误。我的标签映射 id 从 1 开始。知道如何克服这个问题吗?
标签: python tensorflow object-detection