为什么我的 tensorflow 示例代码训练结果在增加？答案

【问题标题】：Why my tensorflow example code training result is increasing?为什么我的 tensorflow 示例代码训练结果在增加？
【发布时间】：2017-02-01 08:29:30
【问题描述】：

您好，我正在学习 tensorflow。这是我的代码，一个简单的多变量张量流示例。运行环境为Python3.5.3、Tensorflow 0.12.1、Windows7。

import tensorflow as tf

# Input data & output data
x1_data = [1.0, 0.0, 3.0, 0.0, 5.0]
x2_data = [0.0, 2.0, 0.0, 4.0, 5.0]
y_data =  [1.0, 2.0, 3.0, 4.0, 5.0]

# W1, W2, b random generation
# W1 = 1, W2 = 1, b = 0 is ideal
W1 = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
W2 = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.random_uniform([1], -1.0, 1.0))

# Our hypothesis
hypothesis = W1 * x1_data + W2 * x2_data + b
# Simplified cost function
cost = tf.reduce_mean(tf.square(hypothesis - y_data))

# Minimize
a = tf.Variable(0.1) # Learning Rate
optimizer = tf.train.GradientDescentOptimizer(a)
train = optimizer.minimize(cost)

# Initialise
init = tf.global_variables_initializer()

# Launch
sess = tf.Session()
sess.run(init)

# Train loop
for step in range(10):
    sess.run(train)
    print(step, sess.run(cost), sess.run(W1), sess.run(W2), sess.run(b))

我预计结果会随着训练循环而减少。

但它会无限增加。

一个变量上的相同代码运行良好，但它减少了。

我不知道为什么 2 变量会增加...

0 52.0504 [ 1.47101164] [ 2.24049234] [ 0.86718893]
1 157.129 [-1.74108529] [-1.84496927] [-0.22162986]
2 478.055 [ 4.02118969] [ 5.11457825] [ 1.86127353]
3 1457.33 [-5.99311352] [-7.13181305] [-1.60902405]
4 4445.18 [ 11.50830746] [ 14.20653534] [ 4.60829926]
5 13561.2 [-19.06884766] [-23.10119247] [-6.10722733]
6 41374.3 [ 34.32733154] [ 42.03698349] [ 12.74352837]
7 126232.0 [-58.95558929] [-71.76408386] [-20.05929375]
8 385134.0 [ 103.96767426] [ 126.9929657] [ 37.3527832]
9 1.17505e+06 [-180.62704468] [-220.19728088] [-62.82305145]

【问题讨论】：

标签： python python-3.x tensorflow

【解决方案1】：

我发现的第一个解决方案是将学习率降低到 0.01。似乎这些步骤过于彻底地改变了您的参数。如果您使用某种正则化技术（如 L2），这可能不会发生。

其次，您的代码需要一些改进。使用张量流矩阵运算并将偏差初始化为零。奇怪的是，当使用 TF 函数进行运算时，即使是 0.1 的学习率也可以。

import tensorflow as tf
import numpy as np

# Input data & output data
x1_data = [1.0, 0.0, 3.0, 0.0, 5.0]
x2_data = [0.0, 2.0, 0.0, 4.0, 5.0]
y_data =  [1.0, 2.0, 3.0, 4.0, 5.0]

input_X = tf.Variable(np.row_stack((x1_data, x2_data)).astype(np.float32))
W = tf.Variable(tf.random_uniform([1,2], -1.0, 1.0))
b = tf.Variable(tf.zeros([1,1]))

# Our hypothesis
hypothesis = tf.add(tf.matmul(W,input_X),b)
# Simplified cost function
cost = tf.reduce_mean(tf.square(hypothesis - y_data))

# Minimize
a = tf.Variable(0.1) # Learning Rate
optimizer = tf.train.GradientDescentOptimizer(a)
train = optimizer.minimize(cost)

# Initialise
init = tf.global_variables_initializer()

# Launch
sess = tf.Session()
sess.run(init)

# Train loop
for step in range(10):
    sess.run(train)
    print(step, sess.run(cost), sess.run(W), sess.run(b))

【讨论】：