【发布时间】:2022-01-02 07:58:25
【问题描述】:
我目前正在研究一个高度不平衡的多标签时尚项目数据集 我尝试使用 class_weights 来解决它,但每个时期的准确度仍然停留在 0.7556。有什么办法,我可以避免这个问题。我是否以错误的方式实现了班级权重?我也尝试过使用数据增强。
我在火车组中有 224 个独特的课程。其中一些只有一个非常令人沮丧的例子
也尝试在此notebook 的帮助下解决问题,但我无法获得相同的准确度分数。看起来,在这个笔记本中,没有考虑数据集不平衡的可能性。
def calculating_class_weights(classes,df):
number_dim = np.shape(classes)[0]
weights = np.empty([number_dim, 2])
for i in range(len(classes)):
weights[i] = compute_class_weight(class_weight='balanced', classes=[0.,1.], y=df[classes[i]])
return weights
def get_weighted_loss(weights):
def weighted_loss(y_true, y_pred):
y_true = tf.cast(y_true, tf.float32)
return K.mean((weights[:,0]**(1-y_true))*(weights[:,1]**(y_true))* K.binary_crossentropy(y_true, y_pred), axis=-1)
return weighted_loss
weights=calculating_class_weights(train_labels,train_df)
train_dataGen = ImageDataGenerator(
rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range = 0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest',
)
valid_dataGen = ImageDataGenerator(rescale=1./255)
model = keras.models.Sequential([
keras.layers.Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu', input_shape=(256,256,3)),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
keras.layers.Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same"),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
keras.layers.Flatten(),
keras.layers.Dense(4096, activation='relu'),
keras.layers.Dropout(0.5),
keras.layers.Dense(4096, activation='relu'),
keras.layers.Dropout(0.5),
keras.layers.Dense(224, activation='sigmoid')
])
model.compile(loss=get_weighted_loss(weights), optimizer='adam', metrics=['accuracy'])
model.fit(train_generator,
epochs=10,
validation_data=valid_generator,
callbacks=[tensorboard_cb,lrr])
【问题讨论】:
-
这个tutorial 演示了如何对一个高度不平衡的数据集进行分类,其中一类中的示例数量大大超过另一类中的示例。
标签: tensorflow keras deep-learning multilabel-classification imbalanced-data