Dependency-MIL

Accurate Subtyping of Lung Cancers by Modelling Class Dependencies

[Paper] [Pre-print] [Code] [BibTeX]

Has been presented in 2024 at the 21th International Symposium on Biomedical Imaging (ISBI-2024).

Authors: George Batchkala, Bin Li, Mengran Fan, Mark McCole, Cecilia Brambilla, Fergus Gleeson, Jens Rittscher.

Creation of the Multi-label Dataset

Source files used to make the labels

DHMC_MetaData_Release_1.0.csv - downloaded from https://bmirds.github.io/LungCancer/; gives predominant LUAD pattern
tcga_classes_extended_info.csv - see https://github.com/GeorgeBatch/TCGA-lung-histology-download/
tcga_dsmil_test_ids.csv - see https://github.com/GeorgeBatch/TCGA-lung-histology-download/
tcia_cptac_md5sum_hashes.txt - see https://github.com/GeorgeBatch/TCIA-CPTAC-lung-histology-download
tcia_cptac_luad_lusc_cohort.csv - see https://github.com/GeorgeBatch/TCIA-CPTAC-lung-histology-download
tcia_cptac_string_2_ouh_labels.csv - took unique values from tcia_cptac_luad_lusc_cohort.csv and manually mapped to labels inspired by OUH (Oxford University Hospitals) reports

Dummy label files

Columns include the label (LUAD vs LUSC) and paths to features:

features_csv_file_path
h5_file_path
pt_file_path

mapping = {
    "LUAD": 0,
    "LUSC": 1,
}

DHMC has only LUAD slides, so all entries in the label field are 0:

TCGA has both LUAD and LUSC so entries in the label field include 0 and 1:

Run the creation code

Run the labels creation code notebook. The code will create the files in labels/experiment-label-files/.

Note, the combined dataset for training/validation is not the same as in the paper since the in-house DART dataset is not publicly available. The test set, however, is the same as in the paper and is fully available in the 8-label task and 5-label task.

Tiling, Feature Extraction, and Training - Improvements In Progress (last updated: June 4th, 2024)

For publication, I used the tiling and feature extraction pipeline from https://github.com/binli123/dsmil-wsi repository. For faster computation, the csv features should be converted into hdf5 and pt files like in https://github.com/mahmoodlab/CLAM. I am currently working on standardising the tiling and feature extraction pipeline for the Dependency-MIL model using tiatoolbox.

For training I used the code from https://github.com/binli123/dsmil-wsi modified to accomodate for partial labels using custom_binary_cross_entropy_with_logits function from source.losses

I will release the code once I finish improving it. If you need the code urgently, please contact me.

PyTorch Dataset and Data Loaders

Code for creating

PyTorch dataset: dataset_detailed.py.
PyTorch data loaders using PyTorch Ligtning Datamodule : datamodule_detailed.py.

Dependency Modelling architecture

Dependency-MIL model can be created using get_model() function from source.models.combined_model

Acknowledgements

George Batchkala is supported by Fergus Gleeson and the EPSRC Center for Doctoral Training in Health Data Science (EP/S02428X/1). The work was done as part of DART Lung Health Program (UKRI grant 40255).

The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Citation

If you find Dependency-MIL useful for your your research and applications, please cite using this BibTeX:

@INPROCEEDINGS{batchkala2024dependency-mil,
  author={Batchkala, George and Li, Bin and Fan, Mengran and McCole, Mark and Brambilla, Cecilia and Gleeson, Fergus and Rittscher, Jens},
  booktitle={2024 IEEE International Symposium on Biomedical Imaging (ISBI)}, 
  title={Accurate Subtyping of Lung Cancers by Modelling Class Dependencies}, 
  year={2024},
  volume={},
  number={},
  pages={1-5},
  keywords={Accuracy;Convolution;Annotations;Histopathology;Lung cancer;Lung;Predictive models;lung cancer;computational pathology;multi-label classification;multiple-instance learning},
  doi={10.1109/ISBI56570.2024.10635232}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
labels		labels
labels_creation_code		labels_creation_code
source		source
tests/unit/models		tests/unit/models
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependency-MIL

Accurate Subtyping of Lung Cancers by Modelling Class Dependencies

Creation of the Multi-label Dataset

Source files used to make the labels

Dummy label files

Run the creation code

Tiling, Feature Extraction, and Training - Improvements In Progress (last updated: June 4th, 2024)

PyTorch Dataset and Data Loaders

Dependency Modelling architecture

Acknowledgements

Citation

About

Releases

Packages

Languages

GeorgeBatch/dependency-mil

Folders and files

Latest commit

History

Repository files navigation

Dependency-MIL

Accurate Subtyping of Lung Cancers by Modelling Class Dependencies

Creation of the Multi-label Dataset

Source files used to make the labels

Dummy label files

Run the creation code

Tiling, Feature Extraction, and Training - Improvements In Progress (last updated: June 4th, 2024)

PyTorch Dataset and Data Loaders

Dependency Modelling architecture

Acknowledgements

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages