Skip to content

[ICML 2024] Code and Data for "Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion"

License

Notifications You must be signed in to change notification settings

israwal/dissect-videoqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion

QUAG and QUAG-attention

QUAG and QUAG-attention are variants of self-attention that impair specific modality interactions in multimodal models. The relative drop in performance with respect to the unperturbed model is representative of

We provide a google colab demo to test QUAG and QUAG-attention and apply it to custom models in QUAG_QUAGAttention_Demo.ipynb

CLAVI

We provide the scripts to curate CLAVI dataset from Charades. CLAVI contains complement questions and videos, and the models are thoroughly evaluated using consistent accuracies. The detailed steps of curation can be found in /data/DATA.md

Citation

@InProceedings{rawal2024dissect,
  title = 	 {{{Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion}}},
  author =       {Rawal, {Ishaan Singh} and Matyasko, Alexander and Jaiswal, Shantanu and Fernando, Basura and Tan, Cheston},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  year = 	 {2024},
  publisher =    {PMLR}
}

About

[ICML 2024] Code and Data for "Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published