TensorFlow 在线程中运行时出现问题答案

【问题标题】：Something wrong when TensorFlow running in threadTensorFlow 在线程中运行时出现问题
【发布时间】：2023-08-27 18:35:01
【问题描述】：

我正在编写一个多线程人脸识别程序，使用 Keras 作为高级模型，使用 tensorflow 作为后端。代码如下：

class FaceRecognizerTrainThread(QThread):

    def run(self):
        print("[INFO] Loading images...")
        images, org_labels, face_classes = FaceRecognizer.load_train_file(self.train_file)

        print("[INFO] Compiling Model...")
        opt = SGD(lr=0.01)
        face_recognizer = LeNet.build(width=Const.FACE_SIZE[0], height=Const.FACE_SIZE[1], depth=Const.FACE_IMAGE_DEPTH,
                                      classes=face_classes, weightsPath=None)
        face_recognizer.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

        images = np.array(images)[:, np.newaxis, :, :] / 255.0
        labels = np_utils.to_categorical(org_labels, face_classes)

        print("[INFO] Training model...")
        try:
            face_recognizer.fit(images, labels, epochs=50, verbose=2, batch_size=10)
        except Exception as e:
            print(e)
        print("[INFO] Training model done...")

        save_name = "data/CNN_" + time.strftime("%Y%m%d%H%M%S", time.localtime()) + ".hdf5"
        if save_name:
            face_recognizer.save(save_name)

        self.signal_train_end.emit(save_name)

在正常模式下运行它时一切正常，但是当我在 QThread 中运行它时，当它进入时

face_recognizer.fit(images, labels, epochs=50, verbose=2, batch_size=10)

它给了我错误：

Cannot interpret feed_dict key as Tensor: Tensor Tensor("conv2d_1_input:0", shape=(?, 1, 30, 30), dtype=float32) is not an element of this graph.

我该如何解决？欢迎大家提出建议，非常感谢~~~~

【问题讨论】：

标签： multithreading python-3.x tensorflow keras qthread

【解决方案1】：

TensorFlow 允许您定义 tf.Graph()，然后您可以使用图形创建 tf.Session()，然后运行图形中定义的操作。当您以这种方式执行此操作时，每个 QThread 都在尝试创建自己的 TF Graph。这就是为什么您会收到not an element of this graph 的错误。我没有看到您的feed_dict 代码，所以我假设它可能在您的其他线程看不到的主线程中运行。在每个线程中包含您的 feed_dict 可能会使其工作，但如果不查看您的完整代码就很难得出结论。

Replicating models in Keras and Tensorflow for a multi-threaded setting 可能会对您有所帮助。

要解决您的问题，您应该使用类似于this post 的内容。从该帖子复制的代码：

# Thread body: loop until the coordinator indicates a stop was requested.
# If some condition becomes true, ask the coordinator to stop.
def MyLoop(coord):
  while not coord.should_stop():
    ...do something...
    if ...some condition...:
      coord.request_stop()

# Main thread: create a coordinator.
coord = tf.train.Coordinator()

# Create 10 threads that run 'MyLoop()'
threads = [threading.Thread(target=MyLoop, args=(coord,)) for i in xrange(10)]

# Start the threads and wait for all of them to stop.
for t in threads:
  t.start()
coord.join(threads)

关于inter_op 和intra_op 并行性here 也值得一读。

【讨论】：