使用神经网络学习方波函数答案

【问题标题】：Use neural network to learn a square wave function使用神经网络学习方波函数
【发布时间】：2019-01-17 05:33:03
【问题描述】：

出于好奇，我正在尝试使用 tensorflow 构建一个简单的全连接 NN 来学习如下方波函数：

因此输入是 x 值的一维数组（作为水平轴），输出是二进制标量值。我使用tf.nn.sparse_softmax_cross_entropy_with_logits 作为损失函数，使用tf.nn.relu 作为激活函数。有 3 个隐藏层（100*100*100）和一个输入节点和输出节点。生成输入数据以匹配上述波形，因此数据大小不是问题。

但是，经过训练的模型似乎无法完成，总是预测负类。

所以我想弄清楚为什么会这样。 NN 配置是否次优，或者是由于 NN 在表面下存在一些数学缺陷（尽管我认为 NN 应该能够模仿任何功能）。

谢谢。

根据评论部分的建议，这里是完整的代码。我注意到之前说错的一件事是，实际上有 2 个输出节点（由于 2 个输出类）：

"""
    See if neural net can find piecewise linear correlation in the data
"""

import time
import os
import tensorflow as tf
import numpy as np

def generate_placeholder(batch_size):
    x_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 1))
    y_placeholder = tf.placeholder(tf.float32, shape=(batch_size))
    return x_placeholder, y_placeholder

def feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, loop):
    x_selected = [[None]] * batch_size
    y_selected = [None] * batch_size
    for i in range(batch_size):
        x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
        y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
    feed_dict = {x_placeholder: x_selected,
                 y_placeholder: y_selected}
    return feed_dict

def inference(input_x, H1_units, H2_units, H3_units):

    with tf.name_scope('H1'):
        weights = tf.Variable(tf.truncated_normal([1, H1_units], stddev=1.0/2), name='weights') 
        biases = tf.Variable(tf.zeros([H1_units]), name='biases')
        a1 = tf.nn.relu(tf.matmul(input_x, weights) + biases)

    with tf.name_scope('H2'):
        weights = tf.Variable(tf.truncated_normal([H1_units, H2_units], stddev=1.0/H1_units), name='weights') 
        biases = tf.Variable(tf.zeros([H2_units]), name='biases')
        a2 = tf.nn.relu(tf.matmul(a1, weights) + biases)

    with tf.name_scope('H3'):
        weights = tf.Variable(tf.truncated_normal([H2_units, H3_units], stddev=1.0/H2_units), name='weights') 
        biases = tf.Variable(tf.zeros([H3_units]), name='biases')
        a3 = tf.nn.relu(tf.matmul(a2, weights) + biases)

    with tf.name_scope('softmax_linear'):
        weights = tf.Variable(tf.truncated_normal([H3_units, 2], stddev=1.0/np.sqrt(H3_units)), name='weights') 
        biases = tf.Variable(tf.zeros([2]), name='biases')
        logits = tf.matmul(a3, weights) + biases

    return logits

def loss(logits, labels):
    labels = tf.to_int32(labels)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')
    return tf.reduce_mean(cross_entropy, name='xentropy_mean')

def inspect_y(labels):
    return tf.reduce_sum(tf.cast(labels, tf.int32))

def training(loss, learning_rate):
    tf.summary.scalar('lost', loss)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    global_step = tf.Variable(0, name='global_step', trainable=False)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op

def evaluation(logits, labels):
    labels = tf.to_int32(labels)
    correct = tf.nn.in_top_k(logits, labels, 1)
    return tf.reduce_sum(tf.cast(correct, tf.int32))

def run_training(x, y, batch_size):
    with tf.Graph().as_default():
        x_placeholder, y_placeholder = generate_placeholder(batch_size)
        logits = inference(x_placeholder, 100, 100, 100)
        Loss = loss(logits, y_placeholder)
        y_sum = inspect_y(y_placeholder)
        train_op = training(Loss, 0.01)
        init = tf.global_variables_initializer()
        sess = tf.Session()
        sess.run(init)
        max_steps = 10000
        for step in range(max_steps):
            start_time = time.time()
            feed_dict = feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, step)
            _, loss_val = sess.run([train_op, Loss], feed_dict = feed_dict)
            duration = time.time() - start_time
            if step % 100 == 0:
                print('Step {}: loss = {:.2f} {:.3f}sec'.format(step, loss_val, duration))
    x_test = np.array(range(1000)) * 0.001
    x_test = np.reshape(x_test, (1000, 1))
    _ = sess.run(logits, feed_dict={x_placeholder: x_test})
    print(min(_[:, 0]), max(_[:, 0]), min(_[:, 1]), max(_[:, 1]))
    print(_)

if __name__ == '__main__':

    population = 10000

    input_x = np.random.rand(population)
    input_y = np.copy(input_x)

    for bin in range(10):
        print(bin, bin/10, 0.5 - 0.5*(-1)**bin)
        input_y[input_x >= bin/10] = 0.5 - 0.5*(-1)**bin

    batch_size = 1000

    input_x = np.reshape(input_x, (population, 1))

    run_training(input_x, input_y, batch_size)

样本输出表明模型总是更喜欢第一类而不是第二类，如min(_[:, 0]) > max(_[:, 1])所示，即第一类的最小logit输出高于第二类的最大logit输出，对于population 的样本大小。

我的错。问题出现在以下行：

for i in range(batch_size):
    x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
    y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]

Python 正在将整个 x_selected 列表更改为相同的值。现在这个代码问题已经解决了。解决方法是：

x_selected = np.zeros((batch_size, 1))
y_selected = np.zeros((batch_size,))
for i in range(batch_size):
    x_selected[i, 0] = x[(loop*batch_size + i) % x.shape[0], 0]
    y_selected[i] = y[(loop*batch_size + i) % y.shape[0]]

在此修复后，模型显示出更多变化。它当前为 x 0.5 输出 1 类。但这仍然远非理想。

因此，将网络配置更改为 100 个节点 * 4 层后，经过 100 万个训练步骤（批量大小 = 100，样本大小 = 1000 万），模型表现非常好，仅在 y 翻转时显示边缘处的错误。因此，此问题已结束。

【问题讨论】：

“总是预测负类”到底是什么意思？你的意思是你的输出总是负数？为什么不使用整个线条形状（比如说一个句点）作为输入？例如 100 分作为输入，而您尝试获得 100 分作为输出？
你能发布你的网络架构代码吗？

标签： machine-learning tensorflow neural-network

【解决方案1】：

您实际上是在尝试学习 periodic function 并且该函数是高度非线性和非平滑的。所以它并不像看起来那么简单。简而言之，更好地表示输入特征会有所帮助。

假设您有一个句号T = 2、f(x) = f(x+2)。对于输入/输出为整数时的简化问题，您的函数则为 f(x) = 1 if x is odd else -1. 在这种情况下，您的问题将简化为 this discussion，我们在其中训练神经网络以区分奇数和偶数。

我想那篇文章中的第二个项目符号应该会有所帮助（即使对于输入为浮点数的一般情况）。

尝试使用固定长度精度以二进制形式表示数字。

在我们上面的简化问题中，很容易看出，只要知道最低有效位，就可以确定输出。

decimal  binary  -> output
1:       0 0 1   -> 1
2:       0 1 0   -> -1
3:       0 1 1   -> 1
...

【讨论】：

当您的波函数越过 0 时，您的目标值是 -1、0 还是 1？你看到这里有悖论吗？
我认为当输入是浮点数时，我们应该为一般问题做出选择，例如：i.stack.imgur.com/u2eJL.png
那么对于一个未知的问题，我应该如何推广这种方法，使NN能够检测到这样的模式。例如一个几何级数，其中x = {1, 2, 4, 8, .. .} and y = {1 if x >= 2^n & x
我不知道通用方法。幸运的是，在实践中，神经网络运行良好。你更新的代码和结果是否也表明，只要你定义一个有限的视野，神经网络就可以正常工作？
我尝试了一个更深的网络，它有 4 层，每层有 100 个节点。现在结果看起来更好一些非线性行为。我相信只要有更多的数据和更多的时间（一切都归结为时间），性能就会变得不错。谢谢！

【解决方案2】：

我为here中的奇偶数识别问题创建了模型和结构。

如果你抽象出以下事实：

decimal  binary  -> output
1:       0 0 1   -> 1
2:       0 1 0   -> -1
3:       0 1 1   -> 1

几乎等同于：

decimal  binary  -> output
1:       0 0 1   -> 1
2:       0 1 0   -> 0
3:       0 1 1   -> 1

您可以更新代码以满足您的需要。

【讨论】：