For the GATE paper:
-
Image classification -> Done
-
Visual relational reasoning -> Done
-
Semantic segmentation
-
Few Shot Learning -> Working on it
-
Zero shot learning
-
Medical image classification
-
Medical semantic segmentation
-
Video classification
-
Text classification (models that support text modalities, CLIP, other multi-modal foundation models)
-
Audio classification -> as a modality shift (remove root, replace with new modality root embedding)
Premise is that we think, based on recent and past work, that we need a more