AutoML is an Azure Machine Learning feature, that empowers both professional and citizen data scientists to build machine learning models rapidly. Since its launch, AutoML has helped accelerate model building for essential machine learning tasks like Classification, Regression and Time-series Forecasting. With the preview of AutoML for Images, there will be added support for Vision tasks. Data scientists will be able to easily generate models trained on image data for scenarios like Image Classification (multi-class, multi-label), Object Detection and Instance Segmentation.
Customers across various industries are looking to leverage machine learning to build models that can process image data. Applications range from image classification of fashion photos to PPE detection in industrial environments. Customers want a solution to easily build models, controlling the model training to generate the optimal model for their training data, and a way to easily manage these ML models end-to-end. While Azure Machine Learning offers a solution for managing the end-to-end ML Lifecycle, customers currently have to rely on the tedious process of custom training their image models. Iteratively finding the right set of model algorithms and hyperparameters for these scenarios typically require significant data scientist effort.
With AutoML support for Vision tasks, Azure ML customers can easily build models trained on image data, without writing any training code. Customers can seamlessly integrate with Azure ML's Data Labeling capability and use this labeled data for generating image models. They can control the model generated by specifying the model algorithm and can optionally tune the hyperparameters. They can sweep over multiple model algorithms / hyperparameter ranges and find the optimal model for their needs. The resulting model can then be downloaded or deployed as a web service in Azure ML and can be operationalized at scale, leveraging AzureML MLOps capabilities.
Authoring AutoML models for vision tasks will be initially supported via the Azure ML Python SDK. The resulting experimentation runs, models and outputs will be accessible from the Azure ML Studio UI.
Azure Machine Learning is a service that accelerates the end-to-end machine learning lifecycle, helping developers and data scientists to build, train and deploy models fast, with robust MLOps capabilities to allow operationalizing these ML models at scale. AutoML for Images is a feature within Azure Machine Learning that allows users to easily and rapidly build vision models from image data, while maintaining full control and visibility over the model building process. AutoML for Images is the ideal solution for customer scenarios that might require control over model training, deployment and the end to end ML lifecycle and it is addressed to customers having machine learning knowledge in computer vision space.
AutoML for Images includes the following feature capabilities -
- Ability to use AutoML to generate models for Image Classification, Object Detection and Instance Segmentation, via Python SDK
- Control over training environment - model training takes place in the user's training environment, that can be secured with a virtual network. Training data never leaves the customer controlled workspace. Users can control the compute target used for training, selecting from VM SKUs with standard GPUs to advanced multi-GPU SKUs for faster training.
- Control over model training algorithms and hyperparameters - in some scenarios, getting optimal model performance requires tuning the underlying algorithms and hyperparameters. With AutoML, users can select specific Deep Learning architectures and customize them. This control can range from easily getting the default model for the specified architecture to advanced controls that can sweep the hyperparameter space and come up with the optimal model. When sweeping over a hyperparameter space, controls for sampling mechanisms and early termination are made available to the user, allowing for the optimal use of resource budget.
- Control over deployment of the resulting model - AutoML models can be deployed as an endpoint in the user's AzureML workspace. Users can control the compute used for inferencing, use high performance serving with Triton Inference server and can secure the inferencing environment with a virtual network.
- Ability to download the resulting model and use it in other environments.
- Visibility into the model building process - users can see a leaderboard with the various configurations tried and a compare model performance using evaluation metrics and charts for each.
- Integration with MLOps capabilities for operationalizing the resulting model at scale
- Integration with Azure ML Data Labeling capabilities and Datasets for creating or adding to your training data
- Integration with Azure ML Pipelines to create a workflow that stitches together various ML phases such as data preparation, training and deployment
This feature is targeted to data scientists with ML knowledge in the Computer Vision space, looking to build ML models using image data in Azure Machine Learning. It targets to boost data scientist productivity, while allowing full control of the model algorithm, hyperparameters and training and deployment environments.
Like all Azure ML features, customers incur the costs associated with the Azure resources consumed (for example, compute and storage costs). There are no additional fees associated with Azure Machine Learning or AutoML for Images. See Azure Machine Learning pricing for details.
AutoML allows you to easily train models for Image Classification, Object Detection & Instance Segmentation on your image data. You can control the model algorithm to be used, specify hyperparameter values for your model as well as perform a sweep across the hyperparameter space to generate an optimal model. Parameters for configuring your AutoML run for image related tasks are specified using the 'AutoMLImageConfig' in the Python SDK.
AutoML for Images supports the following task types:
- image-classification
- image-multi-labeling
- image-object-detection
- image-instance-segmentation
This task type is a required parameter and is passed in using the task
parameter in the AutoMLImageConfig. For example:
from azureml.train.automl import AutoMLImageConfig
automl_image_config = AutoMLImageConfig(task='image-object-detection')
In order to generate Vision models, you will need to bring in labeled image data as input for model training in the form of an AzureML Labeled Dataset
. You can either use a Labeled Dataset that you have exported from a Data Labeling project, or create a new Labeled Dataset with your labeled training data.
Labeled datasets are Tabular datasets with some enhanced capabilites such as mounting and downloading. Structure of the labeled dataset depends upon the task at hand. For Image task types, it consists of the following fields:
- image_url: contains filepath as a StreamInfo object
- image_details: image metatadata information consist of height, width and format. This field is optional and hence may or may not exist. This restriction might become mandatory in the future.
- label: a json representation of the image label, based on the task type
Creation of labeled datasets is supported from data in JSONL format.
Here is a sample JSONL file for Image classfication:
{
"image_url": "AmlDatastore://image_data/Image_01.png",
"image_details":
{
"format": "png",
"width": "2230px",
"height": "4356px"
},
"label": "cat"
}
{
"image_url": "AmlDatastore://image_data/Image_02.jpeg",
"image_details":
{
"format": "jpeg",
"width": "3456px",
"height": "3467px"
},
"label": "dog"
}
And here is a sample JSONL file for Object Detection:
{
"image_url": "AmlDatastore://image_data/Image_01.png",
"image_details":
{
"format": "png",
"width": "2230px",
"height": "4356px"
},
"label":
{
"label": "cat",
"topX": "1",
"topY": "0",
"bottomX": "0",
"bottomY": "1",
"isCrowd": "true",
}
}
{
"image_url": "AmlDatastore://image_data/Image_02.png",
"image_details":
{
"format": "jpeg",
"width": "1230px",
"height": "2356px"
},
"label":
{
"label": "dog",
"topX": "0",
"topY": "1",
"bottomX": "0",
"bottomY": "1",
"isCrowd": "false",
}
}
If your training data is in a different format (e.g. pascal VOC), you can leverage the helper scripts included with the sample notebooks in this repo to convert the data to JSONL. Once your data is in JSONL format, you can create a labeled dataset using this snippet:
from azureml.contrib.dataset.labeled_dataset import _LabeledDatasetFactory, LabeledDatasetTask
from azureml.core import Dataset
training_dataset = _LabeledDatasetFactory.from_json_lines(
task=LabeledDatasetTask.OBJECT_DETECTION, path=ds.path('odFridgeObjects/odFridgeObjects.jsonl'))
training_dataset = training_dataset.register(workspace=ws, name=training_dataset_name)
You can optionally specify another labeled dataset as a validation dataset to be used for your model. If no validation dataset is specified, 20% of your training data will be used for validation by default, unless you pass split_ratio
argument with a different value.
Training data is a required parameter and is passed in using the training_data
parameter. Validation data is optional and is passed in using the validation_data
parameter of the AutoMLImageConfig. For example:
from azureml.train.automl import AutoMLImageConfig
automl_image_config = AutoMLImageConfig(training_data=training_dataset)
You will need to provide a Compute Target that will be used for your AutoML model training. AutoML models for computer vision tasks require GPU SKUs and support NC and ND families. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model.
The compute target is a required parameter and is passed in using the compute_target
parameter of the AutoMLImageConfig. For example:
from azureml.train.automl import AutoMLImageConfig
automl_image_config = AutoMLImageConfig(compute_target=compute_target)
When using AutoML to build computer vision models, users can control the model algorithm and sweep hyperparameters. These model algorithms and hyperparameters are passed in as the parameter space for the sweep.
The model algorithm is required and is passed in via model_name
parameter. You can either specify a single model_name or choose between multiple.
- Image Classification (multi-class and multi-label): 'resnet18', 'resnet34', 'resnet50', 'mobilenetv2', 'seresnext'
- Object Detection (OD): 'yolov5', 'fasterrcnn_resnet50_fpn', 'fasterrcnn_resnet34_fpn', 'fasterrcnn_resnet18_fpn', 'retinanet_resnet50_fpn'
- Instance segmentation (IS): 'maskrcnn_resnet50_fpn'
In addition to controlling the model algorthm used, you can also tune hyperparameters used for model training. While many of the hyperparameters exposed are model-agnostic, some are task-specific and a few are model-specific.
The following tables list out the details of the hyperparameters and their default values for each:
Model-agnostic hyperparameters
Parameter Name | Description | Default |
---|---|---|
number_of_epochs | Number of training epochs Optional, Positive Integer |
all (except yolov5) : 15 yolov5: 30 |
training_batch_size | Training batch size Note: the defaults are largest batch size which can be used on 12GiB GPU memory Optional, Positive Integer |
multi-class / multi-label: 78 OD (except yolov5) / IS: 2 yolov5: 16 |
validation_batch_size | Validation batch size Note: the defaults are largest batch size which can be used on 12GiB GPU memory Optional, Positive Integer |
multi-class / multi-label: 78 OD (except yolov5) / IS: 2 yolov5: 16 |
early_stopping | Enable early stopping logic during training Optional, 0 or 1 |
1 |
early_stopping_patience | Min num of epochs/validation evaluations with no primary metric improvement before the run is stopped Optional, Positive Integer |
5 |
early_stopping_delay | Min num of epochs/validation evaluations to wait before primary metric improvement is tracked for early stopping Optional, Positive Integer |
5 |
learning_rate | Initial learning rate Optional, float in [0, 1] |
multi-class: 0.01 multi-label: 0.035 OD (except yolov5) / IS: 0.05 yolov5: 0.01 |
lr_scheduler | Type of learning rate scheduler Optional, one of {warmup_cosine, step} |
warmup_cosine |
step_lr_gamma | Value of gamma for the learning rate scheduler if it is of type step Optional, float in [0, 1] |
0.5 |
step_lr_step_size | Value of step_size for the learning rate scheduler if it is of type step Optional, Positive Integer |
5 |
warmup_cosine_lr_cycles | Value of cosine cycle for the learning rate scheduler if it is of type warmup_cosine Optional, float in [0, 1] |
0.45 |
warmup_cosine_lr_warmup_epochs | Value of warmup epochs for the learning rate scheduler if it is of type warmup_cosine Optional, Positive Integer |
2 |
optimizer | Type of optimizer Optional, one of {sgd, adam, adamw} |
sgd |
momentum | Value of momentum for the optimizer if it is of type sgd Optional, float in [0, 1] |
0.9 |
weight_decay | Value of weight_decay for the optimizer if it is of type sgd or adam or adamw Optional, float in [0, 1] |
1e-4 |
nesterov | Enable nesterov for the optimizer if it is of type sgd Optional, 0 or 1 |
1 |
beta1 | Value of beta1 for the optimizer if it is of type adam or adamw Optional, float in [0, 1] |
0.9 |
beta2 | Value of beta2 for the optimizer if it is of type adam or adamw Optional, float in [0, 1] |
0.999 |
amsgrad | Enable amsgrad for the optimizer if it is of type adam or adamw Optional, 0 or 1 |
0 |
evaluation_frequency | Frequency to evaluate validation dataset to get metric scores Optional, Positive Integer |
1 |
split_ratio | Validation split ratio when splitting train data into random train and validation subsets if validation data is not defined Optional, float in [0, 1] |
0.2 |
checkpoint_frequency | Frequency to store model checkpoints. By default, we save checkpoint at the epoch which has the best primary metric on validation Optional, Positive Integer |
no default value (checkpoint at epoch with best primary metric) |
layers_to_freeze | How many layers to freeze for your model. Available freezable layers for each model are here. For instance, passing 2 as value for seresnext means freezing layer0 and layer1. Optional, Positive Integer |
no default value |
Task-specific hyperparameters
For Image Classifcation (Multi-class and Multi-label):
Parameter Name | Description | Default |
---|---|---|
weighted_loss | 0 for no weighted loss 1 for weighted loss with sqrt(class_weights), and 2 for weighted loss with class_weights Optional, 0 or 1 or 2 |
0 |
resize_size | Image size to which to resize before cropping for validation dataset Note: unlike others, seresnext doesn't take an arbitary size Note: training run may get into CUDA OOM if the size is too big Optional, Positive Integer |
256 |
crop_size | Image crop size which is input to your neural network Note: unlike others, seresnext doesn't take an arbitary size Note: training run may get into CUDA OOM if the size is too big Optional, Positive Integer |
224 |
For Object Detection (except yolov5) and Instance Segmentation:
Parameter Name | Description | Default |
---|---|---|
validation_metric_type | Metric computation method to use for validation metrics Optional, one of {none, coco, voc, coco_voc} |
voc |
min_size | Minimum size of the image to be rescaled before feeding it to the backbone Note: training run may get into CUDA OOM if the size is too big Optional, Positive Integer |
600 |
max_size | Maximum size of the image to be rescaled before feeding it to the backbone Note: training run may get into CUDA OOM if the size is too big Optional, Positive Integer |
1333 |
box_score_thresh | During inference, only return proposals with a classification score greater than box_score_thresh Optional, float in [0, 1] |
0.3 |
box_nms_thresh | NMS threshold for the prediction head. Used during inference Optional, float in [0, 1] |
0.5 |
box_detections_per_img | Maximum number of detections per image, for all classes Optional, Positive Integer |
100 |
Model-specific hyperparameters
For yolov5:
Parameter Name | Description | Default |
---|---|---|
validation_metric_type | Metric computation method to use for validation metrics Optional, one of {none, coco, voc, coco_voc} |
voc |
img_size | Image size for train and validation Note: training run may get into CUDA OOM if the size is too big Optional, Positive Integer |
640 |
model_size | Model size Note: training run may get into CUDA OOM if the model size is too big Optional, one of {small, medium, large, xlarge} |
medium |
multi_scale | Enable multi-scale image by varying image size by +/- 50% Note: training run may get into CUDA OOM if no sufficient GPU memory Optional, 0 or 1 |
0 |
box_score_thresh | During inference, only return proposals with a score greater than box_score_thresh. The score is the multiplication of the objectness score and classification probability Optional, float in [0, 1] |
0.001 |
box_iou_thresh | IoU threshold used during inference in nms post processing Optional, float in [0, 1] |
0.5 |
When training vision models, model performance depends heavily on the hyperparameter values selected. Often times, you might want to tune the hyperparameters to get optimal performance.
AutoML for Images allows you to sweep hyperparameters to find the optimal settings for your model. It leverages the hyperparameter tuning capabilities in Azure Machine Learning - you can learn more here.
You can define the model algorithms and hyperparameters to sweep in the parameter space. See Configure model algorithms and hyperparameters for the list of supported model algorithms and hyperparameters for each task type. Details on supported distributions for discrete and continuous hyperparameters can be found here.
When sweeping hyperparameters, you need to specify the sampling method to use for sweeping over the defined parameter space. AutoML for Images supports the following sampling methods using the hyperparameter_sampling
parameter:
- Random Sampling
- Grid Sampling (not supported yet for conditional spaces)
- Bayesian Sampling (not supported yet for conditional spaces)
You can learn more about each of these sampling methods here.
When using AutoML to sweep hyperparameters for your vision models, you can automatically end poorly performing runs with an early termination policy. Early termination improves computational efficiency, saving compute resources that would have been otherwise spent on less promising configurations. AutoML for Images supports the following early termination policies using the policy
parameter -
- Bandit Policy
- Median Stopping Policy
- Truncation Selection Policy
If no termination policy is specified, all configurations are run to completion.
You can learn more about configuring the early termination policy for your hyperparameter sweep here.
You can control the resources spent on your hyperparameter sweep by specifying the iterations
and the max_concurrent_iterations
for the sweep.
- iterations (required when sweeping): Maximum number of configurations to sweep. Must be an integer between 1 and 1000.
- max_concurrent_iterations: (optional) Maximum number of runs that can run concurrently. If not specified, all runs launch in parallel. If specified, must be an integer between 1 and 100. (NOTE: The number of concurrent runs is gated on the resources available in the specified compute target. Ensure that the compute target has the available resources for the desired concurrency.)
You can specify the metric to be used for model optimization and hyperparameter tuning using the optional primary_metric
parameter. Default values depend on the task type -
- 'accuracy' for image-classification
- 'iou' for image-multi-labeling
- 'mean_average_precision' for image-object-detection
- 'mean_average_precision' for image-instance-segmentation
You can optionally specify the maximum time budget for your AutoML Vison experiment using experiment_timeout_hours
- the amount of time in hours before the experiment terminates. If none specified, default experiment timeout is 6 days.
You can pass fixed settings or parameters that don't change during the parameter space sweep as arguments. Arguments are passed in name-value pairs and the name must be prefixed by a double dash. For example:
from azureml.train.automl import AutoMLImageConfig
arguments = ["--early_stopping", 1, "--evaluation_frequency", 2]
automl_image_config = AutoMLImageConfig(arguments=arguments)
Please refer to the following sample notebooks to see how you can use AutoML for Images with sample data in your scenario -
Object Detection - AutoML for Images Object Detection Sample Notebook
Multi-Class Image Classification - AutoML for Images Multi-Class Classification Sample Notebook
Multi-Label Image Classification - AutoML for Images Multi-Label Classification Sample Notebook
Instance Segmentation - AutoML for Images Instance Segmentation Sample Notebook