Improving Visual Relation Detection using Depth Maps

by Sahand Sharifzadeh¹, Sina Moayed Baharlou^2, Max Berrendorf^1, Rajat Koner¹, Volker Tresp^1,3

¹ Ludwig Maximilian University, Munich, Germany, ² Sapienza University of Rome, Italy
³ Siemens AG, Munich, Germany
^* Equal contribution

Abstract

State-of-the-art visual relation detection methods mostly rely on object information extracted from RGB images such as predicted class probabilities, 2D bounding boxes and feature maps. Depth maps can additionally provide valuable information on object relations, e.g. helping to detect not only spatial relations, such as standing behind, but also non-spatial relations, such as holding. In this work, we study the effect of using different object information with a focus on depth maps. To enable this study, we release a new synthetic dataset of depth maps, VG-Depth, as an extension to Visual Genome (VG). We also note that given the highly imbalanced distribution of relations in VG, typical evaluation metrics for visual relation detection cannot reveal improvements of under-represented relations. To address this problem, we propose using an additional metric, calling it Macro Recall@K, and demonstrate its remarkable performance on VG. Finally, our experiments confirm that by effective utilization of depth maps within a simple, yet competitive framework, the performance of visual relation detection can be significantly improved.

Model

Highlights

We perform an extensive study on the effect of using different sources of object information in visual relation detection. We show in our empirical evaluations using the VG dataset, that our model can outperform competing methods by a margin of up to 8% points, even those using external language sources or contextualization.
We release a new synthetic dataset VG-Depth, to compensate for the lack of depth maps in Visual Genome.
We propose Macro Recall@K as a competitive metric for evaluating the visual relation detection performance in highly imbalanced datasets such as Visual Genome.

Results

Paper

For further information, please read our paper here.

Bibtex

@misc{sah2019improving,
    title={Improving Visual Relation Detection using Depth Maps},
    author={Sahand Sharifzadeh and Sina Moayed Baharlou and Max Berrendorf and Rajat Koner and Volker Tresp},
    year={2019},
    eprint={1905.00966},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Depth Dataset

We release a new dataset called VG-Depth as an extension to Visual Genome. This dataset contains synthetically generated depth maps from Visual Genome images and can be downloaded from the following link: VG-Depth

Code

Requirements

The main requirements of the provided code are as follows:

Python >= 3.6

PyTorch >= 1.1

TorchVision >= 0.2.0

TensorboardX

CUDA Toolkit 10.0

Pandas

Overrides

Gdown

Setup

Please make sure you have Cuda Toolkit 10.0 installed, then you can setup the environment by calling the following script:

./setup_env.sh

This script will perform the following operations:

Install the required libraries.
Download the Visual-Genome dataset.
Download the Depth version of VG.
Download the necessary checkpoints.
Compile the CUDA libraries.
Prepare the environment.

If you have already installed some of the libraries or downloaded the datasets, you can run the following scripts individually:

./data/fetch_dataset.sh             # Download Visual-Genome
./data/fetch_depth_1024.sh          # Download VG-Depth
./checkpoints/fetch_checkpoints.sh  # Download the Checkpoints

How to Run

Update the config.py file to set the datasets paths, and adjust your PYTHONPATH: (e.g. export PYTHONPATH=/home/sina/Depth-VRD).

To train or evaluate the networks, run the following scripts and select between different models and hyper-parameters by passing the pre-defined arguments:

To train the depth models separately (without the fusion layer):

./scripts/shz_models/train_depth.sh 1.1

To train the each modality seperately:

./scripts/shz_models/train_individual.sh 1.1

To train the fusion models:

./scripts/shz_models/train_fusion.sh 1.1

To evaluate the models:

./scripts/shz_models/eval_scripts.sh 1

The training scripts will be performed for eight different random seeds and 25 epochs each.

Checkpoints

You can also separately download the full model (LCVD) here. Please note that this checkpoint generates results that are slightly different that the ones reported in our paper. The reason is that for a fair comparison, we have reported the mean values from evaluations over several models.

Contributors

This repository is created and maintained by Sahand Sharifzadeh, Sina Moayed Baharlou, and Max Berrendorf.

Acknowledgements

The skeleton of our code is built on top of the nicely organized Neural-Motifs framework (incl. the rgb data loading pipeline and part of the evaluation code). We have upgraded these parts to be compatible with PyTorch 1. To enable a fair comparison of models, our object detection backbone also uses the same Faster-RCNN weights as that work.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
checkpoints		checkpoints
data		data
dataloaders		dataloaders
docs		docs
lib		lib
misc		misc
models		models
scripts		scripts
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Visual Relation Detection using Depth Maps

by Sahand Sharifzadeh¹, Sina Moayed Baharlou^2, Max Berrendorf^1, Rajat Koner¹, Volker Tresp^1,3

Abstract

Model

Highlights

Results

Paper

Bibtex

Depth Dataset

Code

Requirements

Setup

How to Run

Checkpoints

Contributors

Acknowledgements

About

Releases

Packages

Languages

License

sharifza/Depth-VRD

Folders and files

Latest commit

History

Repository files navigation

Improving Visual Relation Detection using Depth Maps

by Sahand Sharifzadeh1, Sina Moayed Baharlou2*, Max Berrendorf1*, Rajat Koner1, Volker Tresp1,3

Abstract

Model

Highlights

Results

Paper

Bibtex

Depth Dataset

Code

Requirements

Setup

How to Run

Checkpoints

Contributors

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

by Sahand Sharifzadeh¹, Sina Moayed Baharlou^2, Max Berrendorf^1, Rajat Koner¹, Volker Tresp^1,3

Packages