Skip to content

[MICCAI'2022] CaPTion workshop: Active Data Enrichment by Learning What to Annotate in Digital Pathology

Notifications You must be signed in to change notification settings

GeorgeBatch/active-data-enrichment

Repository files navigation

Active Data Enrichment by Learning What to Annotate in Digital Pathology


Repository Contents

  • metrics.py: Implementation of the Ranking Curve AUC metric ranking_auc() for a given dataset, as well as expected case, and worst case curves
  • plotting.py: functionality for plotting the ranking curves plot_ranking_curves() for the given dataset, as well as expected case, and worst case curves
  • Example.ipynb: Example usage of the ranking_auc() and plot_ranking_curves() functions together on toy data for both unsupervised and supervised retrieval

Contributions Summary

  • Proposed a comprehensive annotation protocol for lung cancer pathology
  • Proposed a new ranking metric for comparing image retrieval methods (code in metrics.py)
  • Fine-tuned ResNet for predicting pattern presence on regions from pathology images

Overarching Project

dart-flow

Together with my colleagues at we are collecting a dataset (as part of the DART Lung Health Project) that will help us find new clinically-relevant features on chest CT images that can potentially allow to reduce the number of invasive procedures and help preventing cancer by detecting it earlier.

First, we aim to train a model that would mimic the diagnosis process pathologists follow when reviewing histology slides (which is the focus of the current work). Then, we aim to use the predictions from histology images in a combination with the corresponding chest CTs in order to find new relevant features on these CT scans.

Annotation Protocol

annotation-protocol

To achieve the goal of the overarching project we developed a comprehensive lung cancer pathology annotation protocol that closely mimics the pathologist workflow. First, the pathologist is asked to select enough regions relevant to make the diagnosis. Second, the pathologist annotates these regions marking the presense and abscence of the morphological patterns outlined in the WHO 2021 guidelines for lung cancers.

Evaluation Metric: Ranking Curve AUC

As our data collection is ongoing we do not know how many annotated datasets can be generated. Hence we propose a metric to measure the retrieval performance for variable number of examples: Ranking Curve AUC. Our metric is similar to ROC AUC used for measuring classification performance.

Here we provide the formal definition. See supplementary material (or the pdf with pre-review supplementary material if you do not have access to the former option).

ranking-curve-definition

The metric calculation for a specific dataset can be performed using the ranking_auc() function provided in metrics.py. This function also computes the curve for the expected and the worst case scenarios.

Unsipervised Retrieval

For one of the patterns (keratinization), we only had 1 example in the first batch of annotated images. Hense, we used unsupervised retrieval to rank the pre-selected regions from the second batch. With the feature extractor pre-trained on patches from TCGA lung extracted at 10x, we can retrieve 2/3 (10/15) regions with keratinizations from only 1/3 (40/10) ranked regions.

unsupervised-retrieval

Supervised Retrieval

For acinar pattern, we simulated the process of adding ranked or randon images from the "pool" set into the training set. The conclusion is that adding 10 ranked images results in greater improvements to the model performance on the unseen test set than adding 10, 20, or 30 random images.

supervised-retrieval

The 4 regions shown, contain acinar pattern. They were picked from the top-10 ranked pool set examples returned by our method for acinar pattern. Solid arrows point at areas confirmed and delineated by the pathologist to contain acinar pattern, thus validating the results.

Acknowledgements

George Batchkala is supported by Fergus Gleeson and the EPSRC Center for Doctoral Training in Health Data Science (EP/S02428X/1). Tapabrata Chakraborti is supported by Linacre College, Oxford. The work was done as part of DART Lung Health Program (UKRI grant 40255).

The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Cite

@InProceedings{10.1007/978-3-031-17979-2_12,
title="Active Data Enrichment by Learning What to Annotate in Digital Pathology",
author="Batchkala, George
    and Chakraborti, Tapabrata
    and McCole, Mark
    and Gleeson, Fergus
    and Rittscher, Jens",
editor="Ali, Sharib
    and van der Sommen, Fons
    and Papie{\.{z}}, Bart{\l}omiej W{\l}adys{\l}aw
    and van Eijnatten, Maureen
    and Jin, Yueming
    and Kolenbrander, Iris",
booktitle="Cancer Prevention Through Early Detection",
year="2022",
publisher="Springer Nature Switzerland",
address="Cham",
pages="118--127",
isbn="978-3-031-17979-2"
}

About

[MICCAI'2022] CaPTion workshop: Active Data Enrichment by Learning What to Annotate in Digital Pathology

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published