跨不同机器训练自动编码器答案

【问题标题】：Training an autoencoder across different machines跨不同机器训练自动编码器
【发布时间】：2019-10-15 05:23:39
【问题描述】：

我正在尝试训练自动编码器以支持无线数据传输。编码器部分将位于收发器的发送器侧，而解码器将位于接收器侧。一般来说，发射器和接收器可以相隔几英里，并存在于不同的计算机上。

自动编码器必须使用真实的物理通道进行训练，因此有必要在两台不同的计算机（发送器和接收器计算机）之间执行反向传播。我的问题是，如何在接收端开始反向传播过程，并在发送端完成它？

为了让这个问题更简单一点，如果你能帮助我在两个不同的文件之间执行反向传播，那可能足以让我根据需要扩展它。想象一下，编码器由一个文件定义，而解码器由另一个文件定义。我将如何跨这两个单独的文件执行反向传播？

我愿意使用 pytorch 或 tensorflow，无论哪个更适合解决问题。如果可能，Pytorch 将是我的首选。

这是标准自动编码器的 pytorch 代码，它存在于一个文件中并作用于 CIFAR 数据。您可以在一行 loss.backward() 中看到如何执行反向传播。当自动编码器在机器之间拆分时，这是行不通的。

import torch
import torchvision as tv
import torchvision.transforms as transforms
import torch.nn as nn
from torch.autograd import Variable

# Loading and Transforming data
transform = transforms.Compose([transforms.ToTensor(),  transforms.Normalize((0.4914, 0.4822, 0.4466), (0.247,            0.243, 0.261))])
trainTransform  = tv.transforms.Compose([tv.transforms.ToTensor(), tv.transforms.Normalize((0.4914, 0.4822, 0.4466), (0.247, 0.243, 0.261))])
trainset = tv.datasets.CIFAR10(root='./data',  train=True,download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=False, num_workers=4)
testset = tv.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

# Writing our model
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder,self).__init__()

        self.encoder = nn.Sequential(
            nn.Conv2d(3, 6, kernel_size=5),
            nn.ReLU(True),
            nn.Conv2d(6,16,kernel_size=5),
            nn.ReLU(True))
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(16,6,kernel_size=5),
            nn.ReLU(True),
            nn.ConvTranspose2d(6,3,kernel_size=5),
            nn.ReLU(True),
            nn.Sigmoid())
    def forward(self,x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

#defining some params
num_epochs = 5 #you can go for more epochs, I am using a mac
batch_size = 128

model = Autoencoder().cpu()
distance = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)

for epoch in range(num_epochs):
    for data in dataloader:
        img, _ = data
        img = Variable(img).cpu()
        # ===================forward=====================
        output = model(img)
        loss = distance(output, img)
        # ===================backward====================
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    # ===================log========================
    print('epoch [{}/{}], loss:{:.4f}'.format(epoch+1, num_epochs, loss.data.numpy()))

【问题讨论】：

可能是个愚蠢的问题，但它会帮助我理解：为什么需要使用真实的物理通道进行训练？
我正在实现一个自动编码器，它可以在没有假设通道模型的情况下学习通道。这意味着没有函数形式的通道可以应用梯度下降来进行训练。训练必须在真实通道中进行，以便算法可以使用通道估计梯度。这样做的好处是收发器无需任何先验通道建模即可适应您放入的任何通道环境。

标签： tensorflow pytorch wireless autoencoder

【解决方案1】：

我已简化您的示例以更快地对其进行测试。这就是我想出来的

import torch
import torch.nn as nn


class AE(nn.Module):
    def __init__(self):
        super(AE,self).__init__()
        self.encoder = nn.Sequential(nn.Linear(20, 10),
                        nn.ReLU(True),
                        nn.Linear(10, 5),
                        nn.ReLU(True))
        self.decoder = nn.Sequential(nn.Linear(5, 10),
                        nn.ReLU(True),
                        nn.Linear(10, 20),
                        nn.ReLU(True),
                        nn.Sigmoid())

torch.manual_seed(0)
batch_size = 2
input_size = 20
epochs = 3

model = AE()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
x = torch.randn(batch_size, input_size)

for i in range(epochs):
    optimizer.zero_grad()
    code = model.encoder(x)

    # this would be where the code goes through your physical channel
    torch.save(code, 'code.pt')
    code_from_file = torch.load('code.pt')

    # compute the decoding with the code recovered on the other side
    reconstruction = model.decoder(code_from_file)
    loss = criterion(reconstruction, x)
    print("LOSS: ", loss)

    # this runs the backward pass up to code_from_file
    # because saving is non-differentiable and it has 
    # no knowledge of the computational graph before `code_from_file`
    loss.backward()  

    # here you would move the gradient of code_from_file through the channel
    torch.save(code_from_file.grad, 'code_grad.pt')
    code_grad = torch.load('code_grad.pt')
    # and recover it on the other side

    # feed the gradient to `code.backward` that will run backward pass up to the input
    code.backward(code_grad)
    # now you have the gradients for the encoder part and you can step
    optimizer.step()

【讨论】：