【发布时间】:2018-07-24 17:34:54
【问题描述】:
我正在使用 fit_generator 函数来训练我的模型,并希望验证我的数据是否按预期构建和使用。我从 keras.utils.Sequence() 派生的类实现了 __getitem__、__len__ 和 on_epoch_end 方法,如下所示:
class PairwiseSequence(Sequence):
"""Generator that returns a combination of simulations (over a
parametrizable amount of timesteps) and the corresponding metric distance.
pair_list: List of pairwise combinations of simulations
results: dictionary with results for the metric distance between
simulation pairs
sim_files: List of filenames representing single timesteps
batch_size: number of samples to process in a single interference run
"""
def __init__(self, pair_list, results, mean, std, train=False, sim_files=None,
batch_size=1):
self.pair_list = pair_list
self.results = results
self.batch_size = batch_size
self.sim_files = sim_files
self.mean = mean
self.std = std
self.train = train
def __len__(self):
return math.ceil(len(self.pair_list) / self.batch_size)
def __getitem__(self, idx):
dummy = LOADING_METHOD(self.pair_list[0][0], self.sim_files)
x_1 = np.zeros(tuple([self.batch_size]) + dummy.shape)
x_2 = np.zeros(tuple([self.batch_size]) + dummy.shape)
y = np.zeros((self.batch_size, 1))
if self.train:
#print((idx * self.batch_size + i) % len(self.pair_list), ',')
print("training idx:", idx)
else:
print("validation idx:", idx)
for i in range(0, self.batch_size):
(sim1, sim2) = self.pair_list[(idx * self.batch_size + i) %
len(self.pair_list)]
x_1[i] = LOADING_METHOD(sim1, self.sim_files) - self.mean[0]
x_1[i] /= self.std[0]
x_2[i] = LOADING_METHOD(sim2, self.sim_files) - self.mean[1]
x_2[i] /= self.std[1]
y[i] = self.results[frozenset((sim1.ensemble, sim2.ensemble))]
return [x_1, x_2], y
def on_epoch_end(self):
if self.train:
print("training generator: epoch end")
else:
print("validation generator: epoch end")
#random.shuffle(self.pair_list)
此类用作训练和验证数据的生成器(两个单独的实例)。
如您所见,我正在打印__getitem__ 的idx 参数,并在一个纪元结束到控制台时发出一些通知。我正在调用 fit_generator 如下(启用多处理):
history_callback = model.fit_generator(
generator=train_gen,
steps_per_epoch=len(train_gen),
epochs=epochs,
verbose=0,
callbacks=callbacks,
validation_data=valid_gen,
validation_steps=len(valid_gen),
workers=1,
use_multiprocessing=False,
shuffle=False
)
我也不再对数据进行洗牌。使用此配置,我预计 idx 从 0 变为 len(generator),然后调用 on_epoch_end。我有 372 个用于训练的样本和 93 个用于验证的样本,batch_size 12 idx 应该从 0 到 30(训练数据)分别从 0 到 7(验证数据)。但是__getitem__ 比我预期的更频繁地被调用,而且on_epoch_end 也被调用了!下面是控制台输出的样子:
batch_size: 12
len(train_gen): 31
len(valid_gen): 8
2018-02-14 08:45:09.041929: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
training idx: 0
training idx: 1
training idx: 2
training idx: 3
training idx: 4
training idx: 5
training idx: 6
training idx: 7
training idx: 8
training idx: 9
training idx: 10
training idx: 11
training idx: 12
training idx: 13
training idx: 14
training idx: 15
training idx: 16
training idx: 17
training idx: 18
training idx: 19
training idx: 20
training generator: epoch end
training idx: 21
training idx: 22
training idx: 23
training idx: 24
training idx: 25
training idx: 26
training idx: 27
training idx: 28
training idx: 29
training idx: 30
training idx: 0
validation generator: epoch end
validation idx: 0
training idx: 1
validation idx: 1
training idx: 2
validation idx: 2
training idx: 3
validation idx: 3
training idx: 4
validation idx: 4
training idx: 5
validation idx: 5
validation generator: epoch end
training idx: 6
validation idx: 6
training idx: 7
validation idx: 7
training idx: 8
validation idx: 0
training idx: 9
validation idx: 1
training idx: 10
validation idx: 2
validation idx: 3
validation idx: 4
validation idx: 5
validation idx: 6
validation idx: 7
validation idx: 0
validation idx: 1
validation idx: 2
Epoch 00000: val_loss improved from inf to 10512.69922, saving model to /home/stefan/vcs/MA/code/results/test/TB_dummy_distance_10513.hdf5
training idx: 11
training idx: 12
training idx: 13
training idx: 14
training idx: 15
training idx: 16
training idx: 17
training idx: 18
training idx: 19
training idx: 20
training generator: epoch end
training idx: 21
training idx: 22
training idx: 23
training idx: 24
training idx: 25
training idx: 26
training idx: 27
training idx: 28
training idx: 29
training idx: 30
training idx: 0
validation generator: epoch end
validation idx: 0
training idx: 1
validation idx: 1
training idx: 2
validation idx: 2
training idx: 3
validation idx: 3
training idx: 4
validation idx: 4
training idx: 5
validation idx: 5
validation generator: epoch end
training idx: 6
validation idx: 6
training idx: 7
validation idx: 7
validation idx: 0
training idx: 8
validation idx: 1
training idx: 9
validation idx: 2
training idx: 10
validation idx: 3
validation idx: 4
validation idx: 5
validation idx: 6
validation idx: 7
validation idx: 0
validation idx: 1
validation idx: 2
Epoch 00001: val_loss improved from 10512.69922 to 5905.95929, saving model to /home/stefan/vcs/MA/code/results/test/TB_dummy_distance_5906.hdf5
fit_generator 如何使用__getitem__ 和on_epoch_end 方法?它是否还调用这些方法在第一个 epoch 开始之前获取一些用于权重初始化的样本数据?这种行为是由某种缓存引起的吗?
非常感谢任何帮助!提前谢谢!
更新:
出于测试目的,我将fit_generator 的max_queue_size 参数更改为1。这是最终的终端输出:
batch_size: 12
len(train_gen): 31
len(valid_gen): 8
2018-02-14 10:10:40.001065: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
training idx: 0
training idx: 1
training idx: 2
training idx: 3
training idx: 4
training idx: 5
training idx: 6
training idx: 7
training idx: 8
training idx: 9
training idx: 10
training idx: 11
training idx: 12
training idx: 13
training idx: 14
training idx: 15
training idx: 16
training idx: 17
training idx: 18
training idx: 19
training idx: 20
training idx: 21
training idx: 22
training idx: 23
training idx: 24
training idx: 25
training idx: 26
training idx: 27
training idx: 28
training idx: 29
training idx: 30
training generator: epoch end
training idx: 0
training idx: 1
validation idx: 0
validation idx: 1
validation idx: 2
validation idx: 3
validation idx: 4
validation idx: 5
validation idx: 6
validation generator: epoch end
validation idx: 7
validation idx: 0
validation idx: 1
Epoch 00000: val_loss improved from inf to 18090.34473, saving model to /home/stefan/vcs/MA/code/results/test/TB_dummy_distance_18090.hdf5
training idx: 2
training idx: 3
training idx: 4
training idx: 5
training idx: 6
training idx: 7
training idx: 8
training idx: 9
training idx: 10
training idx: 11
training idx: 12
training idx: 13
training idx: 14
training idx: 15
training idx: 16
training idx: 17
training idx: 18
training idx: 19
training idx: 20
training idx: 21
training idx: 22
training idx: 23
training idx: 24
training idx: 25
training idx: 26
training idx: 27
training idx: 28
training idx: 29
training idx: 30
training generator: epoch end
training idx: 0
training idx: 1
validation idx: 0
validation idx: 1
validation idx: 2
validation idx: 3
validation idx: 4
validation idx: 5
validation idx: 6
validation generator: epoch end
validation idx: 7
validation idx: 0
validation idx: 1
Epoch 00001: val_loss did not improve
现在至少在第一个 epoch 中,所有训练样本都会被查询。但是对于第二个 epoch 的验证数据和训练数据,on_epoch_end 仍然会被提前调用。
【问题讨论】:
-
额外批次的问题回答here。
-
你什么时候能看代码就问这里???
-
我无法复制它。
on_epoch_end总是出现在我机器上的正确位置。您的平台是什么?您使用的是哪个版本的 Keras? -
如果将
flush=True添加到print函数中,问题是否仍然存在? -
@Yu-Yang:我正在使用来自 tensorflow.python.keras 的 keras 和 Tensorflow 1.4.0。我刚刚在另一台使用相同 Tensorflow 版本的机器上尝试过,在那台机器上
on_epoch_end也出现在正确的位置,使用 flush=True 也修复了我第一台机器上的输出!非常感谢!!!
标签: python neural-network keras