使用 keras flow_from_dataframe 从子文件夹中读取图像答案

【问题标题】：Read images from sub-folders using keras flow_from_dataframe使用 keras flow_from_dataframe 从子文件夹中读取图像
【发布时间】：2022-01-22 15:05:11
【问题描述】：

如何在 Keras 中使用 flow_from_dataframe 函数而不是 flow_from_directory 函数读取从子文件夹排列的图像？这是带有子文件夹的数据集的dataset 目录结构排列和带有标签“类”的CSV file 以及我在输出代码中使用的图像ID。`

from tensorflow.keras.preprocessing.image import ImageDataGenerator
import pandas as pd

def append_ext(fn):
    return fn+".png"

traindf=pd.read_csv("trainLabels.csv",dtype=str)
print(traindf)

traindf["id"]=traindf["id"].apply(append_ext)
print(traindf)

datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)

train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="./testdf/",
x_col="id",
y_col="label",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
classes = ["animal_1", "animal_2"],
class_mode="categorical",
target_size=(32,32))

valid_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="./testdf/",
x_col="id",
y_col="label",
subset="validation",
batch_size=32,
seed=42,
shuffle=True,
classes = ["animal_1", "animal_2"],
class_mode="categorical",
target_size=(32,32))`. 

Found 0 validated image filenames belonging to 2 classes.
Found 0 validated image filenames belonging to 2 classes.

谢谢！

【问题讨论】：

标签： python pandas tensorflow machine-learning keras

【解决方案1】：

如果我了解它的目录结构

traindf
------ animal_1
       --------frogs_cars_etc
               ----------------- 1.png
               ----------------- 2.png
               ----------------- etc
               ----------------- 10.png
------ animal_2
       -------frogs_cars-etc
               ----------------- 1.png
               ----------------- 2.png
               ----------------- etc
               ----------------- 10.png

现在在我看来，数据集中只有 2 个类和 20 个图像文件。因此，您引用的 csv 文件似乎与实际数据没有相关性。您可以使用下面的代码创建自己的数据框，但样本太少，我怀疑它根本无法训练。

data_dir=r'.\traindf'  # main directory
filepaths=[] # store list of filepaths to the images
labels = []  # store list of labels for each image file
classlist= os.listdir(data_dir)  # should yield [animal_1, animal_2] these are the classes
for klass in classlist:
    classpath=os.path.join(data_dir, klass, 'frogs_cars_etc') #path to get to file list
    file_list=os.listdir(classpath) # list of files
    for f in file_list: # iterate through the list of files
        fpath=os.path.join(classpath, f) # full path to the file
        filepaths.append(fpath) # save the filepath
        labels.append(klass)    # save the label
Fseries=pd.Series(filepaths, name='filepaths')
Lseries=pd.Series(labels, name='labels')
df=pd.concat([Fseries, Lseries], axis=1) # dataframe of form filepaths  labels
print (df.head())

您可以使用 flow_from_dataframe 中的数据框，但同样只有 20 张图像，所以不是很有用。

【讨论】：

谢谢，我试过了，但仍然检索到 0 张图片。 '找到 0 个经过验证的图像文件名，属于 2 个类。找到属于 2 个类的 0 个经过验证的图像文件名。我不确定错误在哪里，谢谢。
我的代码有错误，抱歉看到更新的答案
在制作文件路径和标签的 df 时，代码工作正常，但 flow_from_dataframe 无法验证子文件夹中的图像。始终输出 Found 0 已验证的属于 2 个类的图像文件名。谢谢，@Gerry P