AutoML for Images

AutoML for Images Overview

What is AutoML for image related tasks?

AutoML is an Azure Machine Learning feature, that empowers both professional and citizen data scientists to build machine learning models rapidly. Since its launch, AutoML has helped accelerate model building for essential machine learning tasks like Classification, Regression and Time-series Forecasting. With the preview of AutoML for Images, there will be added support for Vision tasks. Data scientists will be able to easily generate models trained on image data for scenarios like Image Classification (multi-class, multi-label), Object Detection and Instance Segmentation.

Customers across various industries are looking to leverage machine learning to build models that can process image data. Applications range from image classification of fashion photos to PPE detection in industrial environments. Customers want a solution to easily build models, controlling the model training to generate the optimal model for their training data, and a way to easily manage these ML models end-to-end. While Azure Machine Learning offers a solution for managing the end-to-end ML Lifecycle, customers currently have to rely on the tedious process of custom training their image models. Iteratively finding the right set of model algorithms and hyperparameters for these scenarios typically require significant data scientist effort.

With AutoML support for Vision tasks, Azure ML customers can easily build models trained on image data, without writing any training code. Customers can seamlessly integrate with Azure ML's Data Labeling capability and use this labeled data for generating image models. They can control the model generated by specifying the model algorithm and can optionally tune the hyperparameters. They can sweep over multiple model algorithms / hyperparameter ranges and find the optimal model for their needs. The resulting model can then be downloaded or deployed as a web service in Azure ML and can be operationalized at scale, leveraging AzureML MLOps capabilities.

Authoring AutoML models for vision tasks will be initially supported via the Azure ML Python SDK. The resulting experimentation runs, models and outputs will be accessible from the Azure ML Studio UI.

AutoML Image Capabilities

Azure Machine Learning is a service that accelerates the end-to-end machine learning lifecycle, helping developers and data scientists to build, train and deploy models fast, with robust MLOps capabilities to allow operationalizing these ML models at scale. AutoML for Images is a feature within Azure Machine Learning that allows users to easily and rapidly build vision models from image data, while maintaining full control and visibility over the model building process. AutoML for Images is the ideal solution for customer scenarios that might require control over model training, deployment and the end to end ML lifecycle and it is addressed to customers having machine learning knowledge in computer vision space.
AutoML for Images includes the following feature capabilities -

Ability to use AutoML to generate models for Image Classification, Object Detection and Instance Segmentation, via Python SDK
Control over training environment - model training takes place in the user's training environment, that can be secured with a virtual network. Training data never leaves the customer controlled workspace. Users can control the compute target used for training, selecting from VM SKUs with standard GPUs to advanced multi-GPU SKUs for faster training.
Control over model training algorithms and hyperparameters - in some scenarios, getting optimal model performance requires tuning the underlying algorithms and hyperparameters. With AutoML, users can select specific Deep Learning architectures and customize them. This control can range from easily getting the default model for the specified architecture to advanced controls that can sweep the hyperparameter space and come up with the optimal model. When sweeping over a hyperparameter space, controls for sampling mechanisms and early termination are made available to the user, allowing for the optimal use of resource budget.
Control over deployment of the resulting model - AutoML models can be deployed as an endpoint in the user's AzureML workspace. Users can control the compute used for inferencing, use high performance serving with Triton Inference server and can secure the inferencing environment with a virtual network.
Ability to download the resulting model and use it in other environments.
Visibility into the model building process - users can see a leaderboard with the various configurations tried and a compare model performance using evaluation metrics and charts for each.
Integration with MLOps capabilities for operationalizing the resulting model at scale
Integration with Azure ML Data Labeling capabilities and Datasets for creating or adding to your training data
Integration with Azure ML Pipelines to create a workflow that stitches together various ML phases such as data preparation, training and deployment

Target Audience

This feature is targeted to data scientists with ML knowledge in the Computer Vision space, looking to build ML models using image data in Azure Machine Learning. It targets to boost data scientist productivity, while allowing full control of the model algorithm, hyperparameters and training and deployment environments.

Pricing

Like all Azure ML features, customers incur the costs associated with the Azure resources consumed (for example, compute and storage costs). There are no additional fees associated with Azure Machine Learning or AutoML for Images. See Azure Machine Learning pricing for details.

How to use AutoML to build models for computer vision tasks?

AutoML allows you to easily train models for Image Classification, Object Detection & Instance Segmentation on your image data. You can control the model algorithm to be used, specify hyperparameter values for your model as well as perform a sweep across the hyperparameter space to generate an optimal model. Parameters for configuring your AutoML run for image related tasks are specified using the 'AutoMLImageConfig' in the Python SDK.

Select your task type

AutoML for Images supports the following task types:

image-classification
image-multi-labeling
image-object-detection
image-instance-segmentation

This task type is a required parameter and is passed in using the task parameter in the AutoMLImageConfig. For example:

from azureml.train.automl import AutoMLImageConfig
automl_image_config = AutoMLImageConfig(task='image-object-detection')

Training and Validation data

In order to generate Vision models, you will need to bring in labeled image data as input for model training in the form of an AzureML Labeled Dataset. You can either use a Labeled Dataset that you have exported from a Data Labeling project, or create a new Labeled Dataset with your labeled training data.

Labeled datasets are Tabular datasets with some enhanced capabilites such as mounting and downloading. Structure of the labeled dataset depends upon the task at hand. For Image task types, it consists of the following fields:

image_url: contains filepath as a StreamInfo object
image_details: image metatadata information consist of height, width and format. This field is optional and hence may or may not exist. This restriction might become mandatory in the future.
label: a json representation of the image label, based on the task type

Creation of labeled datasets is supported from data in JSONL format.

JSONL sample schema for each task type

Here is a sample JSONL file for Image classfication:

{
      "image_url": "AmlDatastore://image_data/Image_01.png",
      "image_details":
      {
          "format": "png",
          "width": "2230px",
          "height": "4356px"
      },
      "label": "cat"
  }
  {
      "image_url": "AmlDatastore://image_data/Image_02.jpeg",
      "image_details":
      {
          "format": "jpeg",
          "width": "3456px",
          "height": "3467px"
      },
      "label": "dog"
  }

And here is a sample JSONL file for Object Detection:

{
    "image_url": "AmlDatastore://image_data/Image_01.png",
    "image_details":
    {
        "format": "png",
        "width": "2230px",
        "height": "4356px"
    },
    "label":
    {
        "label": "cat",
        "topX": "1",
        "topY": "0",
        "bottomX": "0",
        "bottomY": "1",
        "isCrowd": "true",
    }
}
{
    "image_url": "AmlDatastore://image_data/Image_02.png",
    "image_details":
    {
        "format": "jpeg",
        "width": "1230px",
        "height": "2356px"
    },
    "label":
    {
        "label": "dog",
        "topX": "0",
        "topY": "1",
        "bottomX": "0",
        "bottomY": "1",
        "isCrowd": "false",
    }
}

If your training data is in a different format (e.g. pascal VOC), you can leverage the helper scripts included with the sample notebooks in this repo to convert the data to JSONL. Once your data is in JSONL format, you can create a labeled dataset using this snippet:

from azureml.contrib.dataset.labeled_dataset import _LabeledDatasetFactory, LabeledDatasetTask
from azureml.core import Dataset

training_dataset = _LabeledDatasetFactory.from_json_lines(
        task=LabeledDatasetTask.OBJECT_DETECTION, path=ds.path('odFridgeObjects/odFridgeObjects.jsonl'))
training_dataset = training_dataset.register(workspace=ws, name=training_dataset_name)

You can optionally specify another labeled dataset as a validation dataset to be used for your model. If no validation dataset is specified, 20% of your training data will be used for validation by default, unless you pass split_ratio argument with a different value.

Training data is a required parameter and is passed in using the training_data parameter. Validation data is optional and is passed in using the validation_data parameter of the AutoMLImageConfig. For example:

from azureml.train.automl import AutoMLImageConfig
automl_image_config = AutoMLImageConfig(training_data=training_dataset)

Compute to run experiment

You will need to provide a Compute Target that will be used for your AutoML model training. AutoML models for computer vision tasks require GPU SKUs and support NC and ND families. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model.

The compute target is a required parameter and is passed in using the compute_target parameter of the AutoMLImageConfig. For example:

from azureml.train.automl import AutoMLImageConfig
automl_image_config = AutoMLImageConfig(compute_target=compute_target)

Configure model algorithms and hyperparameters

When using AutoML to build computer vision models, users can control the model algorithm and sweep hyperparameters. These model algorithms and hyperparameters are passed in as the parameter space for the sweep.

The model algorithm is required and is passed in via model_name parameter. You can either specify a single model_name or choose between multiple.

Currently supported model algorithms:

Image Classification (multi-class and multi-label): 'resnet18', 'resnet34', 'resnet50', 'mobilenetv2', 'seresnext'
Object Detection (OD): 'yolov5', 'fasterrcnn_resnet50_fpn', 'fasterrcnn_resnet34_fpn', 'fasterrcnn_resnet18_fpn', 'retinanet_resnet50_fpn'
Instance segmentation (IS): 'maskrcnn_resnet50_fpn'

Hyperparameters for model training

In addition to controlling the model algorthm used, you can also tune hyperparameters used for model training. While many of the hyperparameters exposed are model-agnostic, some are task-specific and a few are model-specific.

The following tables list out the details of the hyperparameters and their default values for each:

Model-agnostic hyperparameters

Parameter Name	Description	Default
number_of_epochs	Number of training epochs `Optional, Positive Integer`	all (except yolov5) : 15 yolov5: 30
training_batch_size	Training batch size Note: the defaults are largest batch size which can be used on 12GiB GPU memory `Optional, Positive Integer`	multi-class / multi-label: 78 OD (except yolov5) / IS: 2 yolov5: 16
validation_batch_size	Validation batch size Note: the defaults are largest batch size which can be used on 12GiB GPU memory `Optional, Positive Integer`	multi-class / multi-label: 78 OD (except yolov5) / IS: 2 yolov5: 16
early_stopping	Enable early stopping logic during training `Optional, 0 or 1`	1
early_stopping_patience	Min num of epochs/validation evaluations with no primary metric improvement before the run is stopped `Optional, Positive Integer`	5
early_stopping_delay	Min num of epochs/validation evaluations to wait before primary metric improvement is tracked for early stopping `Optional, Positive Integer`	5
learning_rate	Initial learning rate `Optional, float in [0, 1]`	multi-class: 0.01 multi-label: 0.035 OD (except yolov5) / IS: 0.05 yolov5: 0.01
lr_scheduler	Type of learning rate scheduler `Optional, one of {warmup_cosine, step}`	warmup_cosine
step_lr_gamma	Value of gamma for the learning rate scheduler if it is of type step `Optional, float in [0, 1]`	0.5
step_lr_step_size	Value of step_size for the learning rate scheduler if it is of type step `Optional, Positive Integer`	5
warmup_cosine_lr_cycles	Value of cosine cycle for the learning rate scheduler if it is of type warmup_cosine `Optional, float in [0, 1]`	0.45
warmup_cosine_lr_warmup_epochs	Value of warmup epochs for the learning rate scheduler if it is of type warmup_cosine `Optional, Positive Integer`	2
optimizer	Type of optimizer `Optional, one of {sgd, adam, adamw}`	sgd
momentum	Value of momentum for the optimizer if it is of type sgd `Optional, float in [0, 1]`	0.9
weight_decay	Value of weight_decay for the optimizer if it is of type sgd or adam or adamw `Optional, float in [0, 1]`	1e-4
nesterov	Enable nesterov for the optimizer if it is of type sgd `Optional, 0 or 1`	1
beta1	Value of beta1 for the optimizer if it is of type adam or adamw `Optional, float in [0, 1]`	0.9
beta2	Value of beta2 for the optimizer if it is of type adam or adamw `Optional, float in [0, 1]`	0.999
amsgrad	Enable amsgrad for the optimizer if it is of type adam or adamw `Optional, 0 or 1`	0
evaluation_frequency	Frequency to evaluate validation dataset to get metric scores `Optional, Positive Integer`	1
split_ratio	Validation split ratio when splitting train data into random train and validation subsets if validation data is not defined `Optional, float in [0, 1]`	0.2
checkpoint_frequency	Frequency to store model checkpoints. By default, we save checkpoint at the epoch which has the best primary metric on validation `Optional, Positive Integer`	no default value (checkpoint at epoch with best primary metric)
layers_to_freeze	How many layers to freeze for your model. Available freezable layers for each model are here. For instance, passing 2 as value for seresnext means freezing layer0 and layer1. `Optional, Positive Integer`	no default value

Task-specific hyperparameters

For Image Classifcation (Multi-class and Multi-label):

Parameter Name	Description	Default
weighted_loss	0 for no weighted loss 1 for weighted loss with sqrt(class_weights), and 2 for weighted loss with class_weights `Optional, 0 or 1 or 2`	0
resize_size	Image size to which to resize before cropping for validation dataset Note: unlike others, seresnext doesn't take an arbitary size Note: training run may get into CUDA OOM if the size is too big `Optional, Positive Integer`	256
crop_size	Image crop size which is input to your neural network Note: unlike others, seresnext doesn't take an arbitary size Note: training run may get into CUDA OOM if the size is too big `Optional, Positive Integer`	224

For Object Detection (except yolov5) and Instance Segmentation:

Parameter Name	Description	Default
validation_metric_type	Metric computation method to use for validation metrics `Optional, one of {none, coco, voc, coco_voc}`	voc
min_size	Minimum size of the image to be rescaled before feeding it to the backbone Note: training run may get into CUDA OOM if the size is too big `Optional, Positive Integer`	600
max_size	Maximum size of the image to be rescaled before feeding it to the backbone Note: training run may get into CUDA OOM if the size is too big `Optional, Positive Integer`	1333
box_score_thresh	During inference, only return proposals with a classification score greater than box_score_thresh `Optional, float in [0, 1]`	0.3
box_nms_thresh	NMS threshold for the prediction head. Used during inference `Optional, float in [0, 1]`	0.5
box_detections_per_img	Maximum number of detections per image, for all classes `Optional, Positive Integer`	100

Model-specific hyperparameters

For yolov5:

Parameter Name	Description	Default
validation_metric_type	Metric computation method to use for validation metrics `Optional, one of {none, coco, voc, coco_voc}`	voc
img_size	Image size for train and validation Note: training run may get into CUDA OOM if the size is too big `Optional, Positive Integer`	640
model_size	Model size Note: training run may get into CUDA OOM if the model size is too big `Optional, one of {small, medium, large, xlarge}`	medium
multi_scale	Enable multi-scale image by varying image size by +/- 50% Note: training run may get into CUDA OOM if no sufficient GPU memory `Optional, 0 or 1`	0
box_score_thresh	During inference, only return proposals with a score greater than box_score_thresh. The score is the multiplication of the objectness score and classification probability `Optional, float in [0, 1]`	0.001
box_iou_thresh	IoU threshold used during inference in nms post processing `Optional, float in [0, 1]`	0.5

Sweeping hyperparameters for your model

When training vision models, model performance depends heavily on the hyperparameter values selected. Often times, you might want to tune the hyperparameters to get optimal performance.
AutoML for Images allows you to sweep hyperparameters to find the optimal settings for your model. It leverages the hyperparameter tuning capabilities in Azure Machine Learning - you can learn more here.

Define the parameter search space

You can define the model algorithms and hyperparameters to sweep in the parameter space. See Configure model algorithms and hyperparameters for the list of supported model algorithms and hyperparameters for each task type. Details on supported distributions for discrete and continuous hyperparameters can be found here.

Sampling methods for the sweep

When sweeping hyperparameters, you need to specify the sampling method to use for sweeping over the defined parameter space. AutoML for Images supports the following sampling methods using the hyperparameter_sampling parameter:

Random Sampling
Grid Sampling (not supported yet for conditional spaces)
Bayesian Sampling (not supported yet for conditional spaces)

You can learn more about each of these sampling methods here.

Early termination policies

When using AutoML to sweep hyperparameters for your vision models, you can automatically end poorly performing runs with an early termination policy. Early termination improves computational efficiency, saving compute resources that would have been otherwise spent on less promising configurations. AutoML for Images supports the following early termination policies using the policy parameter -

Bandit Policy
Median Stopping Policy
Truncation Selection Policy

If no termination policy is specified, all configurations are run to completion.
You can learn more about configuring the early termination policy for your hyperparameter sweep here.

Resources for the sweep

You can control the resources spent on your hyperparameter sweep by specifying the iterations and the max_concurrent_iterations for the sweep.

iterations (required when sweeping): Maximum number of configurations to sweep. Must be an integer between 1 and 1000.
max_concurrent_iterations: (optional) Maximum number of runs that can run concurrently. If not specified, all runs launch in parallel. If specified, must be an integer between 1 and 100. (NOTE: The number of concurrent runs is gated on the resources available in the specified compute target. Ensure that the compute target has the available resources for the desired concurrency.)

Optimization metric

You can specify the metric to be used for model optimization and hyperparameter tuning using the optional primary_metric parameter. Default values depend on the task type -

'accuracy' for image-classification
'iou' for image-multi-labeling
'mean_average_precision' for image-object-detection
'mean_average_precision' for image-instance-segmentation

Experiment budget

You can optionally specify the maximum time budget for your AutoML Vison experiment using experiment_timeout_hours - the amount of time in hours before the experiment terminates. If none specified, default experiment timeout is 6 days.

Arguments

You can pass fixed settings or parameters that don't change during the parameter space sweep as arguments. Arguments are passed in name-value pairs and the name must be prefixed by a double dash. For example:

from azureml.train.automl import AutoMLImageConfig
arguments = ["--early_stopping", 1, "--evaluation_frequency", 2]
automl_image_config = AutoMLImageConfig(arguments=arguments)

Sample notebooks

Please refer to the following sample notebooks to see how you can use AutoML for Images with sample data in your scenario -

Object Detection - AutoML for Images Object Detection Sample Notebook
Multi-Class Image Classification - AutoML for Images Multi-Class Classification Sample Notebook
Multi-Label Image Classification - AutoML for Images Multi-Label Classification Sample Notebook
Instance Segmentation - AutoML for Images Instance Segmentation Sample Notebook

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
InstanceSegmentation		InstanceSegmentation
MultiClass		MultiClass
MultiLabel		MultiLabel
ObjectDetection		ObjectDetection
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoML for Images