自动编码器不适用于图像答案

【问题标题】：Autoencoder NOT for images自动编码器不适用于图像
【发布时间】：2021-02-28 18:57:52
【问题描述】：

我有一个数据集，我正在尝试使用 Pytorch 进行自动编码（有人告诉我卷积自动编码器是可行的方法）。这个数据集中的每个“点”都是一个 1024 位的向量，我正在尝试将它们编码成可能有 10 个值的向量。

我一直在查看示例，但我能找到的所有内容都涉及图像编码，因此我正在努力研究如何将其应用于我的数据集。

例如，对于为 MNIST 数据集制作自动编码器的这段代码：

import torch.nn as nn
import torch.nn.functional as F

# define the NN architecture
class ConvAutoencoder(nn.Module):
    def __init__(self):
        super(ConvAutoencoder, self).__init__()
        ## encoder layers ##
        # conv layer (depth from 1 --> 16), 3x3 kernels
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)  
        # conv layer (depth from 16 --> 4), 3x3 kernels
        self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
        # pooling layer to reduce x-y dims by two; kernel and stride of 2
        self.pool = nn.MaxPool2d(2, 2)
        
        ## decoder layers ##
        ## a kernel of 2 and a stride of 2 will increase the spatial dims by 2
        self.t_conv1 = nn.ConvTranspose2d(4, 16, 2, stride=2)
        self.t_conv2 = nn.ConvTranspose2d(16, 1, 2, stride=2)


    def forward(self, x):
        ## encode ##
        # add hidden layers with relu activation function
        # and maxpooling after
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        # add second hidden layer
        x = F.relu(self.conv2(x))
        x = self.pool(x)  # compressed representation
        
        ## decode ##
        # add transpose conv layers, with relu activation function
        x = F.relu(self.t_conv1(x))
        # output layer (with sigmoid for scaling from 0 to 1)
        x = F.sigmoid(self.t_conv2(x))
                
        return x

# initialize the NN
model = ConvAutoencoder()
print(model)

# specify loss function
criterion = nn.MSELoss()

# specify loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# number of epochs to train the model
n_epochs = 30

for epoch in range(1, n_epochs+1):
    # monitor training loss
    train_loss = 0.0
    
    ###################
    # train the model #
    ###################
    for data in train_loader:
        # _ stands in for labels, here
        # no need to flatten images
        images, _ = data
        # clear the gradients of all optimized variables
        optimizer.zero_grad()
        # forward pass: compute predicted outputs by passing inputs to the model
        outputs = model(images)
        # calculate the loss
        loss = criterion(outputs, images)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()
        # update running training loss
        train_loss += loss.item()*images.size(0)
            
    # print avg training statistics 
    train_loss = train_loss/len(train_loader)
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(
        epoch, 
        train_loss
        ))

我需要对此类内容进行哪些更改以使其适合我的数据？还有其他更适合的自动编码器吗？

非常感谢任何帮助或指导！

【问题讨论】：

在给定线性或非线性激活函数的情况下，NN 自动编码器可以进行线性和非线性变换。您至少要确保编码器和解码器激活函数处于相同的比例，例如，0 到 1。

标签： python python-3.x pytorch autoencoder

【解决方案1】：

如果您的数据中有某种依赖性（空间：例如图像，时间：例如声音数据，时空：例如视频），那么使用卷积是有意义的。如果没有这样的依赖关系（例如房价回归任务），您可以只使用全连接层。您当然不限于这些，编码器-解码器架构非常通用和灵活。此外，如果你想最终获得一个生成模型，我建议使用变分自编码器。

【讨论】：

谢谢，这是有道理的。我使用化学表示作为我的数据，因此需要更具体地研究这些以了解是否存在这样的依赖性。