卷积自动编码器的损失没有减少答案

【问题标题】：Loss is not decreasing for convolutional autoencoder卷积自动编码器的损失没有减少
【发布时间】：2018-03-17 04:40:01
【问题描述】：

我正在尝试训练卷积自动编码器来编码和解码单声道 MIDI 剪辑的钢琴卷表示。我将音符范围减少到 3 个八度，将歌曲分成 100 个时间步长片段（其中 1 个时间步长 = 1/100 秒），并以 3 个片段为一组训练网络。

我使用 Adagrad 作为优化器，使用 MSE 作为损失函数。损失是巨大的，即使在输入了数百个训练示例后，我也没有看到平均损失没有减少。

这是我的代码：

"""
Most absolutely simple assumptions:
  - not changing the key of any of the files
  - not changing the tempo of any of the files

- take blocks of 36 by 100
- divide up all songs by this amount, cutting off any excess from the 
end, train
"""
from __future__ import print_function
import cPickle as pickle
import numpy as np
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from reverse_pianoroll import piano_roll_to_pretty_midi as pr2pm

N = 1000
# load a NxMxC dataset
    # N: Number of clips
    # M: Piano roll size, the number of midi notes that could possibly be 'on'
    # C: Clip length, in 100ths of a second
dataset = pickle.load(open('mh-midi-data.pickle', 'rb'))
######## take a subset of the data for training ######
# based on the mean and standard deviation of non zero entries in the data, I've
# found that the most populous, and thus best range of notes to take is from
# 48 to 84 (C2 - C5); this is 3 octaves, which is much less than the original
# 10 and a half. Additionally, we're going to take a subsample of 1000 because
# i'm training on my macbook and the network is pretty simple
######################################################
dataset = dataset[:, :, 48:84, :]
dataset = dataset[:N]
######################################################

midi_dim, clip_len = dataset.shape[2:]

class Autoencoder(nn.Module):
    def __init__(self, **kwargs):
        super(Autoencoder, self).__init__(**kwargs)
        # input is 3 x 1 x 36 x 100
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=14, kernel_size=(midi_dim, 2))
        # now transformed to 3 x 14 x 1 x 99
        self.conv2 = nn.Conv2d(in_channels=14, out_channels=77, kernel_size=(1, 4))
        # now transformed to 3 x 77 x 1 x 96
        input_size = 3*77*1*96
        self.fc1 = nn.Linear(input_size, input_size/2)
        self.fc2 = nn.Linear(input_size/2, input_size/4)
        self.fc3 = nn.Linear(input_size/4, input_size/2)
        self.fc4 = nn.Linear(input_size/2, input_size)
        self.tconv2 = nn.ConvTranspose2d(in_channels=77, out_channels=14, kernel_size=(1, 4))
        self.tconv1 = nn.ConvTranspose2d(in_channels=14, out_channels=1, kernel_size=(midi_dim, 2))
        self.sigmoid = nn.Sigmoid()
        return

    def forward(self, x):
        # print("1: {}".format(x.size()))
        x = F.relu(self.conv1(x))
        # print("2: {}".format(x.size()))
        x = F.relu(self.conv2(x))
        # print("3: {}".format(x.size()))
        x = x.view(-1, np.prod(x.size()[:]))
        # print("4: {}".format(x.size()))
        x = F.relu(self.fc1(x))
        # print("5: {}".format(x.size()))
        h = F.relu(self.fc2(x))
        # print("6: {}".format(h.size()))
        d = F.relu(self.fc3(h))
        # print("7: {}".format(d.size()))
        d = F.relu(self.fc4(d))
        # print("8: {}".format(d.size()))
        d = d.view(3, 77, 1, 96)
        # print("9: {}".format(d.size()))
        d = F.relu(self.tconv2(d))
        # print("10: {}".format(d.size()))
        d = self.tconv1(d)
        d = self.sigmoid(d)
        # print("11: {}".format(d.size()))
        return d


net = Autoencoder()
loss_fn = nn.MSELoss()
# optimizer = optim.SGD(net.parameters(), lr=1e-3, momentum=0.9)
optimizer = optim.Adagrad(net.parameters(), lr=1e-3)

batch_count = 0
avg_loss = 0.0
print_every = 3
print("Beginning Training")
for epoch in xrange(2):
    # for i, clip in enumerate(dataset):
    for i in xrange(len(dataset)/3):
        batch = dataset[(3*i):(3*i + 3), :, :]
        # get the input, wrap it in a Variable
        inpt = Variable(torch.from_numpy(batch).type(torch.FloatTensor))

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outpt = net(inpt)
        loss = loss_fn(outpt, inpt)
        loss.backward()
        optimizer.step()

        # print stats out
        avg_loss += loss.data[0]
        if batch_count % print_every == print_every - 1:
           print('epoch: %d, batch_count: %d, loss: %.3f'%(
                epoch + 1, batch_count + 1, avg_loss / print_every))
           avg_loss = 0.0
        batch_count += 1

print('Finished Training')

我真的是这个东西的初学者，所以任何建议都将不胜感激。

【问题讨论】：

welp，真的不知道为什么这个问题被否决了。这是要求改写并帮助社区中的人们学习如何提出问题的一种非常好的方式：匿名投反对票而无需任何解释。谢谢。
一个明显的错误是为您的目标执行 sigmoid + MSE。您应该放弃 sigmoid 并使用“sigmoid_cross_entropy_with_logits”，这是 [0, 1] 中输出的正确自动编码损失

标签： deep-learning convolution autoencoder pytorch

【解决方案1】：

仔细检查您是否将 inpt 标准化为 0 到 1 的范围内。例如，如果您正在处理图像，您可以将 inpt 变量除以 255。

【讨论】：