【问题标题】:Delete model from GPU/CPU in Pytorch在 Pytorch 中从 GPU/CPU 中删除模型
【发布时间】:2021-08-22 13:20:48
【问题描述】:

我的记忆力有很大问题。我正在开发一个带有 GUI 的大型应用程序,用于测试和优化神经网络。主程序显示 GUI,但训练是在线程中完成的。在我的应用程序中,我需要一个接一个地训练许多具有不同参数的模型。为此,我需要为每次尝试创建一个模型。当我训练一个时,我想删除它并训练新的,但我不能删除旧模型。我正在尝试做这样的事情:

del model
torch.cuda.empty_cache()

但 GPU 内存不会改变,

然后我尝试这样做:

model.cpu()
del model

当我将模型移动到 CPU 时,GPU 内存被释放但 CPU 内存增加。 在每次训练尝试中,记忆力一直在增加。只有当我关闭我的应用程序并再次运行它时,所有内存才会被释放。

有没有办法从 GPU 或 CPU 中永久删除模型?

编辑: 代码:

线程,训练过程取悦:

class uczeniegridsearcch(QObject):
     endofoneloop = pyqtSignal()
     endofonesample = pyqtSignal()
     finished = pyqtSignal()
     def __init__(self, train_loader, test_loader, epoch, optimizer, lenoftd, lossfun, numberofsamples, optimparams, listoflabels, model_name, num_of_class, pret):
          super(uczeniegridsearcch, self).__init__()
          self.train_loaderup = train_loader
          self.test_loaderup = test_loader
          self.epochup = epoch
          self.optimizername = optimizer
          self.lenofdt = lenoftd
          self.lossfun = lossfun
          self.numberofsamples = numberofsamples
          self.acc = 0
          self.train_loss = 0
          self.sendloss = 0
          self.optimparams = optimparams
          self.listoflabels = listoflabels
          self.sel_Net = model_name
          self.num_of_class = num_of_class
          self.sel_Pret = pret
          self.modelforsend = []
          

     def setuptrainmodel(self):

          if self.sel_Net == "AlexNet":
               model = models.alexnet(pretrained=self.sel_Pret)
               model.classifier[6] = torch.nn.Linear(4096, self.num_of_class)
          elif self.sel_Net == "ResNet50":
               model = models.resnet50(pretrained=self.sel_Pret)
               model.fc = torch.nn.Linear(model.fc.in_features, self.num_of_class)
          elif self.sel_Net == "VGG13":
               model = models.vgg13(pretrained=self.sel_Pret)
               model.classifier[6] = torch.nn.Linear(model.classifier[6].in_features, self.num_of_class)
          elif self.sel_Net == "DenseNet201":
               model = models.densenet201(pretrained=self.sel_Pret)
               model.classifier = torch.nn.Linear(model.classifier.in_features, self.num_of_class)

          elif self.sel_Net == "MNASnet":
               model = models.mnasnet1_0(pretrained=self.sel_Pret)
               model.classifier[1] = torch.nn.Linear(model.classifier[1].in_features, self.num_of_class)

          elif self.sel_Net == "ShuffleNet v2":
               model = models.shufflenet_v2_x1_0(pretrained=self.sel_Pret)
               model.fc = torch.nn.Linear(model.fc.in_features, self.num_of_class)

          elif self.sel_Net == "SqueezeNet":
               model = models.squeezenet1_0(pretrained=self.sel_Pret)
               model.classifier[1] = torch.nn.Conv2d(512, self.num_of_class, kernel_size=(1, 1), stride=(1, 1))
               model.num_classes = self.num_of_class

          elif self.sel_Net == "GoogleNet":
               model = models.googlenet(pretrained=self.sel_Pret)
               model.fc = torch.nn.Linear(model.fc.in_features, self.num_of_class)

          return model
     def train(self):
          
          for x in range(self.numberofsamples):



               torch.cuda.empty_cache()


               modelup = self.setuptrainmodel()
               

               device = torch.device('cuda')

               optimizerup = TableWidget.setupotimfun(self, modelup, self.optimizername, self.optimparams[(x, 0)],
                                                      self.optimparams[(x, 1)], self.optimparams[(x, 2)],
                                                      self.optimparams[(x, 3)],
                                                      self.optimparams[(x, 4)], self.optimparams[(x, 5)])

               modelup = modelup.to(device)



               

               best_accuracy = 0.0
               


               train_error_count = 0
               
               for epoch in range(self.epochup):

                    for images, labels in iter(self.train_loaderup):
                         images = images.to(device)
                         labels = labels.to(device)
                         optimizerup.zero_grad()
                         outputs = modelup(images)
                         loss = TableWidget.setuplossfun(self, lossfun=self.lossfun, outputs=outputs, labels=labels)
                         self.train_loss += loss
                         loss.backward()
                         optimizerup.step()
                         train_error_count += float(torch.sum(torch.abs(labels - outputs.argmax(1))))
                    self.train_loss /= len(self.train_loaderup)

                    test_error_count = 0.0

                    for images, labels in iter(self.test_loaderup):
                         images = images.to(device)
                         labels = labels.to(device)
                         outputs = modelup(images)
                         test_error_count += float(torch.sum(torch.abs(labels - outputs.argmax(1))))

                    test_accuracy = 1.0 - float(test_error_count) / float(self.lenofdt)

                    print('%s, %d,%d: %f %f' % ("Próba nr:", x+1, epoch, test_accuracy, self.train_loss), "Parametry: ", self.optimparams[x,:])

                    self.acc = test_accuracy
                    self.sendloss = self.train_loss.item()
                    self.endofoneloop.emit()


               self.endofonesample.emit()

               modelup.cpu()
               
               del modelup,optimizerup,device,test_accuracy,test_error_count,train_error_count,loss,labels,images,outputs
               torch.cuda.empty_cache()
               

          self.finished.emit()

我如何在主块中调用线程:

              self.qtest = uczeniegridsearcch(self.train_loader,self.test_loader, int(self.InputEpoch.text()),
                                              self.sel_Optim,len(self.test_dataset), self.sel_Loss,
                                              int(self.numberofsamples.text()), self.params, self.listoflabels,
                                              self.sel_Net,len(self.sel_ImgClasses),self.sel_Pret)

              self.qtest.endofoneloop.connect(self.inkofprogress)
              self.qtest.endofonesample.connect(self.inksamples)
              self.qtest.finished.connect(self.prints)
              testtret = threading.Thread(target=self.qtest.train)
              testtret.start()

【问题讨论】:

    标签: python memory pytorch gpu cpu


    【解决方案1】:

    假设模型创建代码在循环内迭代运行,我建议如下

    1. 将用于模型创建、训练、评估和模型删除的代码放在一个单独的函数中,并从循环体中调用该函数。
    2. 函数调用后调用gc.collect()

    第一点的理由是模型创建、删除和缓存清除将发生在单独的堆栈中,并且在方法返回时会强制清除 GPU 内存。

    【讨论】:

      猜你喜欢
      • 2019-04-20
      • 2019-06-26
      • 2021-12-18
      • 2021-10-05
      • 1970-01-01
      • 1970-01-01
      • 2018-03-24
      • 2022-06-14
      • 2021-01-07
      相关资源
      最近更新 更多