我认为文档可能会很混乱,并且我认为行为会根据您的 Tensorflow 和 Keras 版本而有所不同。例如,在这个post 中,用户正在描述您所期望的确切行为。通常,flow_from_directory() 方法允许您直接从目录中读取图像并在训练模型时对它们进行扩充,如前所述here,它在每个时期对每个文件夹中的每个样本进行迭代。使用以下示例,您可以通过查看进度条中每个 epoch 的步骤来检查是否是这种情况(在 TF 2.7 上):
import tensorflow as tf
BATCH_SIZE = 64
flowers = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
img_gen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
horizontal_flip=True,
)
train_ds = img_gen.flow_from_directory(flowers, batch_size=BATCH_SIZE, shuffle=True, class_mode='sparse')
num_classes = 5
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(256, 256, 3)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
epochs=10
history = model.fit(
train_ds,
epochs=epochs
)
Found 3670 images belonging to 5 classes.
Epoch 1/10
6/58 [==>...........................] - ETA: 3:02 - loss: 2.0608
如果你像这样用tf.data.Dataset.from_generator 包裹flow_from_directory:
train_ds = tf.data.Dataset.from_generator(
lambda: img_gen.flow_from_directory(flowers, batch_size=BATCH_SIZE, shuffle=True, class_mode='sparse'),
output_types=(tf.float32, tf.float32))
您会注意到进度条看起来像这样,因为steps_per_epoch 没有明确定义:
Epoch 1/10
Found 3670 images belonging to 5 classes.
29/Unknown - 104s 4s/step - loss: 2.0364
而且如果你添加这个参数,你会在进度条中看到步骤:
history = model.fit(
train_ds,
steps_per_epoch = len(from_directory),
epochs=epochs
)
Found 3670 images belonging to 5 classes.
Epoch 1/10
3/58 [>.............................] - ETA: 3:19 - loss: 4.1357
最后,关于你的问题:
如何正确使用此生成器,使其具有适合所有功能的功能
我的训练集中的数据,包括原始的、非增强的图像和
增强图像,并循环多次/步骤?
您可以简单地将steps_per_epoch 乘以某个因子,使其超过number of samples // batch_size:
history = model.fit(
train_ds,
steps_per_epoch = len(from_directory)*2,
epochs=epochs
)
Found 3670 images belonging to 5 classes.
Epoch 1/10
1/116 [..............................] - ETA: 12:11 - loss: 1.5885
现在你有 116 步,而不是每个 epoch 的 58 步。