自动编码器的二进制激活函数答案

【问题标题】：binary activation function for autoencoder自动编码器的二进制激活函数
【发布时间】：2019-01-29 21:02:09
【问题描述】：

我有一个自动编码器，它有两个输出（解码，pred_w），一个输出是重建的输入图像，另一个是重建的二进制图像。我在最后一层使用了 sigmoid 激活函数，但输出是浮点数，我需要每个像素的网络标签为 0 或 1。我在这里附上了我的代码。你能指导我怎么做才能解决这个问题吗？谢谢。

from keras.layers import Input, Concatenate, GaussianNoise,Dropout
from keras.layers import Conv2D
from keras.models import Model
from keras.datasets import mnist
from keras.callbacks import TensorBoard
from keras import backend as K
from keras import layers
import matplotlib.pyplot as plt
import tensorflow as tf
import keras as Kr
import numpy as np
import pylab as pl
import matplotlib.cm as cm
import keract
from tensorflow.python.keras.layers import Lambda;

#-----------------building w train---------------------------------------------
w_main = np.random.randint(2,size=(1,4,4,1))
w_main=w_main.astype(np.float32)
w_expand=np.zeros((1,28,28,1),dtype='float32')
w_expand[:,0:4,0:4]=w_main
w_expand.reshape(1,28,28,1)
w_expand=np.repeat(w_expand,49999,0)

#-----------------building w validation---------------------------------------------
w_valid = np.random.randint(2,size=(1,4,4,1))
w_valid=w_valid.astype(np.float32)
wv_expand=np.zeros((1,28,28,1),dtype='float32')
wv_expand[:,0:4,0:4]=w_valid
wv_expand.reshape(1,28,28,1)
wv_expand=np.repeat(wv_expand,9999,0)

#-----------------building w test---------------------------------------------
w_test = np.random.randint(2,size=(1,4,4,1))
w_test=w_test.astype(np.float32)
wt_expand=np.zeros((1,28,28,1),dtype='float32')
wt_expand[:,0:4,0:4]=w_test
wt_expand.reshape(1,28,28,1)
#wt_expand=np.repeat(wt_expand,10000,0)

#-----------------------encoder------------------------------------------------
#------------------------------------------------------------------------------
wtm=Input((28,28,1))
image = Input((28, 28, 1))
conv1 = Conv2D(16, (3, 3), activation='relu', padding='same', name='convl1e')(image)
conv2 = Conv2D(32, (3, 3), activation='relu', padding='same', name='convl2e')(conv1)
conv3 = Conv2D(8, (3, 3), activation='relu', padding='same', name='convl3e')(conv2)
DrO1=Dropout(0.25)(conv3)
encoded =  Conv2D(1, (3, 3), activation='relu', padding='same',name='reconstructed_I')(DrO1)


#-----------------------adding w---------------------------------------
#add_const = Kr.layers.Lambda(lambda x: x + Kr.backend.constant(w_expand))
#encoded_merged=Kr.layers.Add()([encoded,wtm])

add_const = Kr.layers.Lambda(lambda x: x + wtm)
encoded_merged = add_const(encoded)
encoder=Model(inputs=image, outputs= encoded_merged)
encoder.summary()

#-----------------------decoder------------------------------------------------
#------------------------------------------------------------------------------

#encoded_merged = Input((28, 28, 2))
deconv1 = Conv2D(16, (3, 3), activation='relu', padding='same', name='convl1d')(encoded_merged)
deconv2 = Conv2D(32, (3, 3), activation='relu', padding='same', name='convl2d')(deconv1)
deconv3 = Conv2D(8, (3, 3), activation='relu',padding='same', name='convl3d')(deconv2)
DrO2=Dropout(0.25)(deconv3)
decoded = Conv2D(1, (3, 3), activation='relu', padding='same', name='decoder_output')(DrO2) 

#decoder=Model(inputs=encoded_merged, outputs=decoded)
#decoder.summary()
model=Model(inputs=image,outputs=decoded)
#----------------------w extraction------------------------------------
convw1 = Conv2D(16, (3,3), activation='relu', padding='same', name='conl1w')(decoded)
convw2 = Conv2D(32, (3, 3), activation='relu', padding='same', name='convl2w')(convw1)
convw3 = Conv2D(8, (3, 3), activation='relu', padding='same', name='conl3w')(convw2)
DrO3=Dropout(0.25)(convw3)
pred_w = Conv2D(1, (1, 1), activation='sigmoid', padding='same', name='reconstructed_W')(DrO3)  
# reconsider activation (is W positive?)
# should be filter=1 to match W
w_extraction=Model(inputs=[image,wtm],outputs=[decoded,pred_w])


#----------------------training the model--------------------------------------
#------------------------------------------------------------------------------
#----------------------Data preparesion----------------------------------------

(x_train, _), (x_test, _) = mnist.load_data()
x_validation=x_train[1:10000,:,:]
x_train=x_train[10001:60000,:,:]
#
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_validation = x_validation.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_validation = np.reshape(x_validation, (len(x_validation), 28, 28, 1))

#---------------------compile and train the model------------------------------
# is accuracy sensible metric for this model?
w_extraction.compile(optimizer='adadelta', loss={'decoder_output':'mse','reconstructed_W':'mse'}, metrics=['mae'])
w_extraction.fit([x_train,w_expand], [x_train,w_expand],
          epochs=100,
          batch_size=128, 
          validation_data=([x_validation,wv_expand], [x_validation,wv_expand]),
          callbacks=[TensorBoard(log_dir='E:/tmp/AutewithW200', histogram_freq=0, write_graph=False)])
model.summary()

【问题讨论】：

标签： python tensorflow keras keras-layer

【解决方案1】：

如果你在模型中需要这个，你可以使用K.round() from keras.backend。请注意，这将无法区分，并且无法在训练中很好地使用。

如果您只需要结果，您可以简单地定义一个阈值（通常为 0.5）并且：

binary_reslts = results > threshold

为您的模型添加指标

您可以通过添加对数据进行四舍五入的指标来查看结果。这方面的标准指标可以是"accuracy" 或"categorical_accuracy"。您可以定义自己的指标，例如：

def diceMetric(yTrue, yPred):
    yTrue = K.batch_flatten(yTrue)
    yPred = K.batch_flatten(yPred)

    #round
    yPred = K.greater(yPred, 0.5)
    yPred = K.cast(yPred, K.floatx())

    intersection = yPred * yTrue
    sum = yTrue + yPred

    return (2*intersection + K.epsilon())/(sum + K.epsilon())

指标添加在compile:

model.compile(optimizer=..., loss=..., metrics = [diceMetric, 'categorical_accuracy'])

指标不会影响训练，它们只是让您了解正在发生什么的反馈。

【讨论】：

谢谢。但是我在输出中有一个二进制图像，我喜欢网络标签它的每个像素像 0 或 1，我不知道我应该在学习后设置一个阈值？我想看看学习过程中的效果。
谢谢，但我应该在哪里使用这些指标？他们在学习中使用并影响学习吗？我应该把它们放在最后一层的输出上吗？
谢谢。现在，您有两个指标 diceMetric 和 categorical _accuracy，对吗？如果我想要更多指标，我应该为它们定义一个函数并将它们像你一样放在一个列表中？

【解决方案2】：

为什么您需要您的网络准确地输出 0 或 1？您可以将网络的输出解释为概率度量，即输入像素对应于 0 类或 1 类的可能性。因此，在训练过程中，模型会尝试逼近未知的概率分布。

当涉及到预测时，您可以使用像 0.5 这样的阈值，也可以使用像 otsu 阈值这样的东西。然后您将获得二进制输出。不幸的是，阈值会产生一些间隙或缩小某些预测形状的面积。

注意：通常，您希望在自动编码器中进行下采样和上采样，因为否则模型可能会得知同一性函数是最优的。

【讨论】：

对于输出层，最好使用内核大小为 (1,1) 的 sigmoid 激活来获得我提到的输出，或者可以使用每个激活和内核大小？
这取决于您要达到的目标。您可以尝试 softmax 激活，它可以让您训练两个以上的课程。最后，你只需要得到一个适合你的训练标签形状的输出。
我想在输出中生成一个二值图像，我希望最小化输入图像和它之间的差异。
我在这个自动编码器中有两个输入，一个是灰度图像，另一个是二值图像。我结构中的最后一层试图将 sigmoid 的输出与这个二进制图像进行比较，所以我认为它应该有 0 或 1 个值，而不是 0、1 之间的值。因此，我不知道该怎么做函数计算一个真值？我应该在学习期间使用阈值吗？我真的很困惑。你能帮我解决这个问题吗？
输入是什么意思？你有两个图像通过你的网络运行？或者你有一个，通过你的网络运行，另一个是你的标签？！请对二元交叉熵进行一些研究。不，您不应该在学习期间使用阈值，这仅用于学习阶段之后的预测。