如何使用自动编码器 #2nd part - Deep autoencoder #3rd part - Stacked autoencoder 初始化 MLP 的权重答案

【问题标题】：How to initialise weights of a MLP using an autoencoder #2nd part - Deep autoencoder #3rd part - Stacked autoencoder如何使用自动编码器 #2nd part - Deep autoencoder #3rd part - Stacked autoencoder 初始化 MLP 的权重
【发布时间】：2023-12-31 10:55:01
【问题描述】：

我已经构建了一个自动编码器（1 个编码器 8:5，1 个解码器 5:8），它采用 Pima-Indian-Diabetes 数据集 (https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv) 并减少其维度（从 8 到 5）。我现在想使用这些简化的功能来使用 mlp 对数据进行分类。现在，在这里，我对架构的基本理解有些问题。如何使用自动编码器的权重并将它们输入 mlp？我检查了这些线程 - https://github.com/keras-team/keras/issues/91 和 https://www.codementor.io/nitinsurya/how-to-re-initialize-keras-model-weights-et41zre2g。这里的问题是我应该考虑哪个权重矩阵？编码器部分还是解码器部分？当我为 mlp 添加层时，如何使用这些保存的权重初始化权重，而不是获得确切的语法。另外，我的 mlp 是否应该从 5 个神经元开始，因为我的降维是 5？对于这个二元分类问题，mlp 的可能维度是多少？谁能详细说明一下？

深度自编码器代码如下：

# from keras.models import Sequential
from keras.layers import Input, Dense
from keras.models import Model
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy

# Data pre-processing...

# load pima indians dataset
dataset = numpy.loadtxt("C:/Users/dibsa/Python Codes/pima.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]

# Split data into training and testing datasets
x_train, x_test, y_train, y_test = train_test_split(
                                X, Y, test_size=0.2, random_state=42)

# scale the data within [0-1] range
scalar = MinMaxScaler()
x_train = scalar.fit_transform(x_train)
x_test = scalar.fit_transform(x_test)

# Autoencoder code begins here...
encoding_dim1 = 5    # size of encoded representations
encoding_dim2 = 3    # size of encoded representations in the bottleneck layer

# this is our input placeholder
input_data = Input(shape=(8,))
# "encoded" is the first encoded representation of the input
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
# "enc" is the second encoded representation of the input
enc = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
# "dec" is the lossy reconstruction of the input
dec = Dense(encoding_dim1, activation='sigmoid', name='decoder1')(enc)
# "decoded" is the final lossy reconstruction of the input
decoded = Dense(8, activation='sigmoid', name='decoder2')(dec)
# this model maps an input to its reconstruction
autoencoder = Model(inputs=input_data, outputs=decoded)

autoencoder.compile(optimizer='sgd', loss='mse')

# training
autoencoder.fit(x_train, x_train,
            epochs=300,
            batch_size=10,
            shuffle=True,
            validation_data=(x_test, x_test))  # need more tuning

# test the autoencoder by encoding and decoding the test dataset
reconstructions = autoencoder.predict(x_test)
print('Original test data')
print(x_test)
print('Reconstructed test data')
print(reconstructions)

#The stacked autoencoder code is as follows:

# from keras.models import Sequential
from keras.layers import Input, Dense
from keras.models import Model
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy

# Data pre-processing...

# load pima indians dataset
dataset = numpy.loadtxt("C:/Users/dibsa/Python Codes/pima.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]

# Split data into training and testing datasets
x_train, x_test, y_train, y_test = train_test_split(
                                X, Y, test_size=0.2, random_state=42)

# scale the data within [0-1] range
scalar = MinMaxScaler()
x_train = scalar.fit_transform(x_train)
x_test = scalar.fit_transform(x_test)

# Autoencoder code goes here...
encoding_dim1 = 5    # size of encoded representations
encoding_dim2 = 3    # size of encoded representations in the bottleneck layer

# this is our input placeholder
input_data1 = Input(shape=(8,))
# the first encoded representation of the input
encoded1 = Dense(encoding_dim1, activation='relu',
             name='encoder1')(input_data1)
# the first lossy reconstruction of the input
decoded1 = Dense(8, activation='sigmoid', name='decoder1')(encoded1)
# this model maps an input to its first layer of reconstructions
autoencoder1 = Model(inputs=input_data1, outputs=decoded1)
# this is the first encoder model
enc1 = Model(inputs=input_data1, outputs=encoded1)

autoencoder1.compile(optimizer='sgd', loss='mse')

# training
autoencoder1.fit(x_train, x_train, epochs=300,
             batch_size=10, shuffle=True,
             validation_data=(x_test, x_test))
FirstAEoutput = autoencoder1.predict(x_train)

input_data2 = Input(shape=(encoding_dim1,))
# the second encoded representations of the input
encoded2 = Dense(encoding_dim2, activation='relu',
             name='encoder2')(input_data2)
# the final lossy reconstruction of the input
decoded2 = Dense(encoding_dim1, activation='sigmoid',
             name='decoder2')(encoded2)

# this model maps an input to its second layer of reconstructions
autoencoder2 = Model(inputs=input_data2, outputs=decoded2)

# this is the second encoder
enc2 = Model(inputs=input_data2, outputs=encoded2)

autoencoder2.compile(optimizer='sgd', loss='mse')

# training
autoencoder2.fit(FirstAEoutput, FirstAEoutput, epochs=300,
             batch_size=10, shuffle=True)

# this is the overall autoencoder mapping an input to its final reconstructions
autoencoder = Model(inputs=input_data1, outputs=encoded2)
# test the autoencoder by encoding and decoding the test dataset

reconstructions = autoencoder.predict(x_test)
print('Original test data')
print(x_test)
print('Reconstructed test data')
print(reconstructions)

【问题讨论】：

标签： python tensorflow keras autoencoder

【解决方案1】：

很多问题。你都尝试了些什么？编码sn-ps？

如果您的解码器正在尝试重建输入，那么将您的分类器附加到其输出对我来说真的没有意义。我的意思是，为什么不第一次将它附加到输入？因此，如果您打算使用自动编码器，我想说很明显您应该将分类器附加到编码器管道的输出。

我不太清楚“使用自动编码器的权重并将它们输入 mlp”是什么意思。您不会使用另一层的权重来馈送层，而是使用它的输出信号。这在 Keras 上很容易做到。假设您定义了自动编码器并对其进行了这样的训练：

from keras Input, Model
from keras import backend as K
from keras.layers import Dense

x = Input(shape=[8])
y = Dense(5, activation='sigmoid' name='encoder')(x)
y = Dense(8, name='decoder')(y)

ae = Model(inputs=x, outputs=y)
ae.compile(loss='mse', ...)
ae.fit(x_train, x_train, ...)

K.models.save_model(ae, './autoencoder.h5')

然后你可以在编码器上附加一个分类层，并使用以下代码创建分类器模型：

# load the model from the disk if you
# are in a different execution.
ae = K.models.load_model('./autoencoder.h5')

y = ae.get_layer('encoder').output
y = Dense(1, activation='sigmoid', name='predictions')(y)

classifier = Model(inputs=ae.inputs, outputs=y)
classifier.compile(loss='binary_crossentropy', ...)
classifier.fit(x_train, y_train, ...)

就是这样，真的。 classifier 模型现在将ae 模型的第一个嵌入层编码器作为其第一层，然后是sigmoid 决策层预测。 p>

如果你真正想做的是使用自动编码器学习的权重来初始化分类器的权重（我不肯定我推荐这种方法）：

您可以使用layer#get_weights 获取权重矩阵，对其进行修剪（因为编码器有 5 个单元，而分类器只有 1 个），最后设置分类器权重。以下几行：

w, b = ae.get_layer('encoder').get_weights()

# remove all units except by one.
neuron_to_keep = 2
w = w[:, neuron_to_keep:neuron_to_keep + 1]
b = b[neuron_to_keep:neuron_to_keep + 1]

classifier.get_layer('predictions').set_weights(w, b)

【讨论】：

感谢@Idavid 的快速回复。 “你不会用另一层的权重来喂一个层，而是用它的输出信号。”是的，你是对的，我想丢弃重建的输入层并使用瓶颈层作为 mlp 的输入。在合并您提到的更改后，我将发布一个代码 sn-p。但是，我真的需要用编码器权重初始化 mlp 第一层的权重吗？在这种情况下，编号不匹配。的层数。另外，正如您提到的，“y = Dense(1, activation='sigmoid', name='predictions')(y)” 我不应该在 mlp 中添加一个隐藏层吗？
不，我不认为您应该使用编码器权重初始化权重，但您最初的问题包含这样一句话“我如何使用这些保存的权重初始化权重"，这让我相信你想要这样做。因此，我的第二个代码 sn-p 展示了如何规避单元数不匹配。
"我不应该在 mlp 中添加一个隐藏层吗？" - 这取决于您的具体问题。测试一下，看看效果如何。

【解决方案2】：

Idavid，这是供您参考的 - MLP using Autoencoder reduced features。我需要了解哪个数字是正确的？抱歉，我不得不上传图片作为答案，因为没有通过评论上传图片的选项。我想你是说图B是正确的。这是相同的代码sn-p。如果我走对了，请告诉我。

# This is a mlp classification code with features reduced by an Autoencoder

# from keras.models import Sequential
from keras.layers import Input, Dense
from keras.models import Model
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy

# Data pre-processing...

# load pima indians dataset
dataset = numpy.loadtxt("C:/Users/dibsa/Python Codes/pima.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]

# Split data into training and testing datasets
x_train, x_test, y_train, y_test = train_test_split(
                                X, Y, test_size=0.2, random_state=42)

# scale the data within [0-1] range
scalar = MinMaxScaler()
x_train = scalar.fit_transform(x_train)
x_test = scalar.fit_transform(x_test)

# Autoencoder code goes here...
encoding_dim = 5    # size of our encoded representations

# this is our input placeholder
input_data = Input(shape=(8,))
# "encoded" is the encoded representation of the input
encoded = Dense(encoding_dim, activation='relu', name='encoder')(input_data)
# "decoded" is the lossy reconstruction of the input
decoded = Dense(8, activation='sigmoid', name='decoder')(encoded)
# this model maps an input to its reconstruction
autoencoder = Model(inputs=input_data, outputs=decoded)

autoencoder.compile(optimizer='sgd', loss='mse')

# training
autoencoder.fit(x_train, x_train,
            epochs=300,
            batch_size=10,
            shuffle=True,
            validation_data=(x_test, x_test))  # need more tuning

# test the autoencoder by encoding and decoding the test dataset
reconstructions = autoencoder.predict(x_test)
print('Original test data')
print(x_test)
print('Reconstructed test data')
print(reconstructions)

# MLP code goes here...
# create model

x = autoencoder.get_layer('encoder').output
# h = Dense(3, activation='relu', name='hidden')(x)
y = Dense(1, activation='sigmoid', name='predictions')(x)
classifier = Model(inputs=autoencoder.inputs, outputs=y)

# Compile model
classifier.compile(loss='binary_crossentropy', optimizer='adam',
               metrics=['accuracy'])

# Fit the model
classifier.fit(x_train, y_train, epochs=250, batch_size=10)

print('Now making predictions')
predictions = classifier.predict(x_test)
# round predictions
rounded_predicted_classes = [round(x[0]) for x in predictions]
temp = sum(y_test == rounded_predicted_classes)
acc = temp/len(y_test)
print(acc)

【讨论】：

如果您刚刚使用此信息更新了您的问题，而不是回答它，那将是最好的。但是，是的，我相信第二张图片是要走的路，因为第一张图片浪费了计算。为什么首先将分类器附加到试图重建输入而不是输入本身的解码器？代码对我来说看起来不错，但它完成了你想要的吗？
再次感谢@Idavid 的快速回复。是的，它做到了我想要的，我完全相信。我会将您的答案标记为已接受。
受上一个启发，我尝试实现一个深度自编码器和一个堆叠自编码器。深度自动编码器运行良好，但堆叠的自动编码器存在一些问题，我无法弄清楚。请你检查一次，让我知道我哪里出错了。我已经相应地更新了我的问题。
这是错误：ValueError：检查模型输入时出错：预期 input_54 的形状为 (None, 5) 但得到的数组的形状为 (614, 8)