【问题标题】：Can a Tensorflow variable be trained using the Tensorflow Keras functional API model? Can a Tensorflow operation be used in the functional API Model?可以使用 Tensorflow Keras 函数式 API 模型训练 Tensorflow 变量吗？可以在功能 API 模型中使用 Tensorflow 操作吗？
【发布时间】：2025-12-25 03:00:12
【问题描述】：

我想知道 Keras 模型是否使用 tf.get_variable 定义的功能 API 训练变量进行编译/训练？ Keras 训练也可以包含 TensorFlow 操作吗？

所以基本上我希望定义一个带有 Tensorflow 变量和操作的 Keras 模型，然后使用

model = tf.keras.Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=optimizer, loss=loss)
model.fit(data, labels, batch_size=batch_size, epochs=epochs)

训练模型。这样做的原因是 Google 的 TPU 需要 Keras 或 TF.Estimator API，更推荐使用 Keras，所以我想看看如何轻松转换我的模型。

背景

看起来既然 Tensorflow 是后端，就有办法混合 Keras/Tensorflow 变量。这篇博文展示了如何使用 Tensorflow 图/会话训练 Keras 变量 https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html

from keras.layers import Dropout
from keras import backend as K

img = tf.placeholder(tf.float32, shape=(None, 784))
labels = tf.placeholder(tf.float32, shape=(None, 10))

x = Dense(128, activation='relu')(img)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
preds = Dense(10, activation='softmax')(x)

loss = tf.reduce_mean(categorical_crossentropy(labels, preds))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
with sess.as_default():
    for i in range(100):
        batch = mnist_data.train.next_batch(50)
        train_step.run(feed_dict={img: batch[0],
                                  labels: batch[1],
                                  K.learning_phase(): 1})

acc_value = accuracy(labels, preds)
with sess.as_default():
    print acc_value.eval(feed_dict={img: mnist_data.test.images,
                                    labels: mnist_data.test.labels,
                                    K.learning_phase(): 0})

这里还表明 Tensorflow 变量可以用作 Keras 模型的输入

How to set the input of a Keras layer of a functional model, with a Tensorflow tensor?

tf_embedding_input = ...    # pre-processing output tensor

# Keras model
model = Sequential()
model.add(Input(tensor=tf_embedding_input)) 
model.add(Embedding(max_features, 128, input_length=maxlen))

所以我想知道 Keras 是否可以训练 Tensorflow 变量。

示例

我想在下面的Tensorflow架构中训练embedding和softmax变量

  embeddings = tf.get_variable( 'embeddings', 
    initializer= tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))

  softmax_weights = tf.get_variable( 'softmax_weights',
    initializer= tf.truncated_normal([vocabulary_size, embedding_size],
                         stddev=1.0 / math.sqrt(embedding_size)))
  
  softmax_biases = tf.get_variable('softmax_biases', 
    initializer= tf.zeros([vocabulary_size]),  trainable=False )

  embed = tf.nn.embedding_lookup(embeddings, train_dataset) #train data set is

  embed_reshaped = tf.reshape( embed, [batch_size*num_inputs, embedding_size] )
  
  segments= np.arange(batch_size).repeat(num_inputs)

  averaged_embeds = tf.segment_mean(embed_reshaped, segments, name=None)

  loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=averaged_embeds,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

由于 Tensorflow Keras 使用 Tensorflow 后端，我猜想以某种方式使用和训练 Tensorflow 变量并在训练中使用 Tensorflow 操作是可能的。

我为什么要这样做？

Google 的 TPU 要求您的架构通过 Estimator API 或 Keras API 实现。由于更推荐使用 Keras API，因此可能有兴趣将常规的 Tensorflow Graph/Session 转换为使用 Keras API，同时尽可能少地更改其代码。

了解如何合并 Tensorflow 操作并使用 Keras 模型编译/训练来训练 Tensorflow 变量将对此大有帮助。

【问题讨论】：

标签： python tensorflow keras

【解决方案1】：

_小背景：

众所周知，Keras 是一个模型级库，为开发深度学习模型提供高级构建块。

最重要的是：Keras API 不处理张量操作。它需要一个经过良好优化的张量操作库来执行此操作，称为 Keras 的“后端引擎”。

目前，Keras 提供三种后端引擎：TensorFlow 后端 (Google)、Theano 后端和 CNTK 后端 (MSFT)。

了解如何合并 Tensorflow 操作并使用 Keras 模型编译/训练来训练 Tensorflow 变量将对此大有帮助。

您唯一应该问自己的是，Keras 变量和常规 Tensorflow 变量之间的区别是什么。

恰好是 Keras 变量有元数据。因此，为了在 Keras 中使用 TensorFlow 变量，您需要转换 them。

注意：TensorFlow 变量范围不会影响 Keras 层或模型。

最后变量共享可以通过初始化 Keras 层（或模型）来完成。

【讨论】：

所以它会像tf.keras.backend.variable( tf.get_variable(...),...)？另外，由于变量已经转换为 Keras 变量，是否可以对 Keras 变量进行 Tensorflow 操作？而在 Keras 训练过程中，梯度会通过所有的操作流向变量？
混合使用 Tensorflow 和 Keras 操作是危险的，因为 Keras 构建了自己的图形，并且经常将任意 Tensorflow 代码插入其中是行不通的。
谢谢！我应该调查一下。除了“keras 图形构建”之外，您还推荐查找哪些关键词？

【解决方案2】：

这个解决方案会有帮助吗？

keras add external trainable variable to graph

您可以使用

将嵌入和 softmax 层输入到 Keras 模型中

model.add()

然后使用

将这些变量定义为可训练的

model.layers[-1].trainable_weights.extend()

【讨论】：

谢谢！在您发布的链接中，它显示“model.layers[-1].trainable_weights.extend([W, b])”，那么 W 会类似于 W= tf.get_variable(...) 吗？另外，如果对变量 W 进行 tensorflow 操作，那么在 Keras 训练期间，梯度是否会流经这些操作？