【发布时间】:2018-06-28 05:20:03
【问题描述】:
所以我想计算损失,将模型的预测与验证输出进行比较。
我的代码:
def _build_net(self):
self.n_actions = 3
with tf.name_scope('inputs'):
self.tf_obs = tf.placeholder(tf.float32, shape=(None, MAX_NUM, NUM_FEATURES), name="observations")
self.tf_acts = tf.placeholder(tf.int32, shape=(None,),
name="actions_num")
self.tf_vt = tf.placeholder(tf.float32, shape=(None,),
name="actions_value")
flattened_frames = tf.reshape(self.tf_obs, [-1, NUM_FEATURES])
init_layers = tf.random_normal_initializer(mean=0, stddev=0.3)
# fc1
f1_layer = tf.layers.dense(
inputs=flattened_frames,
units=12,
activation=tf.nn.tanh, # tanh activation
kernel_initializer=init_layers,
bias_initializer=tf.constant_initializer(0.1),
name='fc1'
)
# fc2
f2_layer = tf.layers.dense(
inputs=f1_layer,
units=6,
activation=tf.nn.tanh, # tanh activation
kernel_initializer=init_layers,
bias_initializer=tf.constant_initializer(0.1),
name='fc2'
)
# fc3
all_act = tf.layers.dense(
inputs=f2_layer,
units=self.n_actions,
activation=None,
kernel_initializer=init_layers,
bias_initializer=tf.constant_initializer(0.1),
name='fc3'
)
logits = tf.reshape(all_act, [-1, MAX_NUM])
self.all_act_prob = tf.nn.softmax(logits, name='act_prob')
with tf.name_scope('loss'):
neg_log_prob = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=all_act,
labels=self.tf_acts
)
self._loss = tf.reduce_mean(neg_log_prob * self.tf_vt)
with tf.name_scope('train'):
self.train_op = tf.train.AdamOptimizer(self.lr).minimize(self._loss)
我计算损失的方式:
def compute_loss(self, input_data, expected_output_data):
"""
Compute loss on the input data.
:param input_data: numpy array of shape (number of frames, MAX_NUM, NUM_FEATURES)
:param expected_output_data: numpy array of shape (number of frames, MAX_NUM)
:return: training loss on the input data
"""
return self._session.run(self._loss,
feed_dict={self.tf_obs: input_data,
self._target_distribution: expected_output_data})
问题:_build_net 可以工作,但是当我运行 compute_loss 时,我得到了这个错误:
您必须为占位符张量“inputs/actions_value”提供一个值 dtype float 和 shape [?]
[[节点:inputs/actions_value = Placeholderdtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]]
现在我知道我需要为self.tf_acts 和self.tf_vt 输入一些东西,但是如果我不知道它们的值怎么办?我该怎么做才能解决?
另外,这是计算强化学习模型损失(用于验证输入/输出)的正确方法吗?
【问题讨论】:
标签: python tensorflow reinforcement-learning