AutoML for Data Augmentation
2019-04-01 09:26:19
This blog is copied from: https://blog.insightdatascience.com/automl-for-data-augmentation-e87cf692c366
DeepAugment is an AutoML tool focusing on data augmentation. It utilizes Bayesian optimization for discovering data augmentation strategies tailored to your image dataset. The main benefits and features of DeepAugment are:
- Reduces the error rate of CNN models (showed 60% decrease in error for CIFAR10 on WRN-28–10)
- Saves time by automating the process
- AutoAugment
PyPI. You can install it from your terminal by running:
$ pip install deepaugment
tutorial. To learn more about how I built this, read on!
Introduction
reinforcement learning.
Bayesian optimization instead of reinforcement learning.
Ways to get better data
GANs, is promising but complicated, and might diverge from realistic examples.
Data augmentation, on the other hand, is simple and has high impact. It is applicable to most datasets and is done with simple image transformations. The problem, however, is determining which augmentation technique is best for the dataset at hand. Discovering the proper method requires time-consuming experimentation. Even after many experiments, a machine learning (ML) engineer may still not discover the best option.
MNIST digits dataset, because a 180 degree rotation on a “6” would make it look like a “9”, while still being labeled as a 6. On the other hand, applying rotation to satellite images can improve results significantly since a car image from the air will still be a car, no matter how much it is rotated.
DeepAugment: lightning fast autoML
15,000 iterations to learn augmentation policies, requiring huge computational resources. Most people could not benefit from it even if its source code was fully available.
DeepAugment addresses these problems with the following design goals:
- Minimize the computational complexity of the optimization of data augmentation while maintaining the quality of results.
- modular and user-friendly.
In order to achieve the first goal, DeepAugment was designed with the following differences, as compared to AutoAugment:
- Utilizes Bayesian optimization instead of reinforcement learning (requires fewer iterations) (~100x speed-up)
- Minimizes size of child model (decreases computational complexity of each training) (~20x speed-up)
- Less stochastic augmentation search space design (decreases number of iterations needed)
configuration options).
Designing augmentation policies
imgaugpackage, which is known for its large collection of augmentation techniques (see below).
Augmentations are most effective when they are diverse and randomly applied. For instance, instead of rotating every image, it is better to rotate some portion of images, shear another portion, and apply a color inversion for another. Based on this observation, DeepAugment applies one of five sub-policies (consisting of two augmentations) randomly to the images. During the optimization process, each image has an equal chance (16%) of being augmented by one of five sub-policies and a 20% chance of not being augmented at all.
While I was inspired by AutoAugment for this policy design, there is one main difference: I do not use any parameters for the probability of applying sub-policies in order to make policies less stochastic and allow optimization in fewer iterations.
This policy design creates a 20-dimensional search space for the Bayesian optimizer, where 10 dimensions are categorical (type of augmentation technique) and the other 10 are real-values (magnitudes). Since categorical values are involved, I configured the Bayesian optimizer to use a random forest estimator.
How DeepAugment finds the best policies
child model, with the overall workflow as follows: the controller samples new augmentation policies, the augmenter transforms images by the new policy, and the child model is trained from scratch by the augmented images.
How Bayesian optimization works” below). The controller then samples new policies again and the same steps repeat. This process cycles until the user-determined maximum number of iterations are reached.
expected improvement as its acquisition function.
How Bayesian optimization works
The aim of Bayesian optimization is to find a set of parameters that maximize the value of the objective function. A working cycle of Bayesian optimization can be summarized as:
- Build a surrogate model of the objective function
- Find parameters that perform best on the surrogate
- Execute the objective function with these parameters
- Update the surrogate model with these parameters and the score of the objective function
- Repeat steps 2–4 until the maximum number of iterations is reached
this review paper.
Trade-offs of Bayesian optimization
here). This is due to the fact that Bayesian optimization learns from runs with the previous parameters, contrary to grid search and random search.
AutoAugment, for example, iterates 15,000 times in order to learn good policies (which means training the child CNN model 15,000 times). Bayesian optimization, on the other hand, learns good polices in 100–300 iterations. A rule of thumb for Bayesian optimization is making the number of iterations as much as the number of optimized parameters times 10.
Challenges and solutions
Challenge 1: Optimizing for augmentation requires a lot of computational resources, since the child model should be trained from scratch over and over. This dramatically slowed down the development process of my tool. Even though usage of Bayesian optimization made it faster, the optimization process was still not fast enough to make development feasible.
Solutions: I developed two solutions. First, I optimized the child CNN model (see below), which is the computational bottleneck of the process. Second, I designed augmentation policies in a more deterministic way, making the Bayesian optimizer require fewer iterations.
Challenge 2: I encountered an interesting problem during the development of DeepAugment. During the optimization of augmentations by training the child model over and over, they started to overfit to the validation set. I discovered that my best-found policies perform poorly when I changed the validation set. This is an interesting case because it is different than overfitting, in the general sense, where model weights are overfitting to the noise in the data.
How to integrate into your ML pipeline
PyPI. You can install it from your terminal by running:
$ pip install deepaugment
And usage is easy:
from deepaugment.deepaugment import DeepAugment
deepaug = DeepAugment(my_images, my_labels)
best_policies = deepaug.optimize()
A more advanced usage, by configuring DeepAugment:
from keras.datasets import cifar10
# my configuration
my_config = {
"model": "basiccnn",
"method": "bayesian_optimization",
"train_set_size": 2000,
"opt_samples": 3,
"opt_last_n_epochs": 3,
"opt_initial_points": 10,
"child_epochs": 50,
"child_first_train_epochs": 0,
"child_batch_size": 64
}
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# X_train.shape -> (N, M, M, 3)
# y_train.shape -> (N)
deepaug = DeepAugment(x_train, y_train, config=my_config)
best_policies = deepaug.optimize(300)
tutorial.
Conclusion
To our knowledge, DeepAugment is the first method utilizing Bayesian optimization to find the best data augmentations. Optimization of data augmentation is a recent research area, and AutoAugment was one of the first methods tackling this problem.
The main contribution of DeepAugment to the open-source community is that it makes the process scalable, enabling users to optimize augmentation policies without needing huge computational resources*. It is very modular and >50 times faster than the previous solution, AutoAugment.
reduce error by 60% for a WideResNet-28-10 model using the CIFAR-10 small image dataset when compared to the same model and dataset without augmentation.
DeepAugment currently only optimizes augmentations for the image classification task. It could be expanded to optimize for object detection or segmentation tasks, and I welcome your contributions if you would like to do so. However, I would expect that the best augmentation policies are very dependent on the type of dataset, and less so on the task (such as classification or object detection). This means AutoAugment should find similar strategies regardless of the task, but it would be very interesting if these strategies end up being very different!
While DeepAugment currently works for image datasets, it would be very interesting to extend it for text, audio or video datasets. The same concept is applicable to other types of datasets as well.
*DeepAugment takes 4.2 hours (500 iterations) on CIFAR-10 dataset which costs around $13 using AWS p3.x2large instance.
program at Insight!
Resources
github.com/barisozmen/deepaugment
bit.ly/deepaugmentslides
bit.ly/deepaugmentusage
==