This repository contains the official implementation of POET (POse Estimation Transformer) and is built on DETR.
We replace the full complex hand-crafted pose estimation pipeline with a Transformer, and outperform Associative Embedding with a ResNet-50, obtaining 54 mAP on COCO. POET is rather fast and excellent for real-time applications.
For details see our article End-to-End Trainable Multi-Instance Pose Estimation with Transformers by Lucas Stoffl, Maxime Vidal and Alexander Mathis.
Download of POET model trained on COCO: POET-R50
An example of how to use it is given in the demo notebook which loads the pre-trained model, generate predictions and visualizese them: POET's demo notebook
To train a POET model on a single node with 2 GPUs for 250 epochs run:
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --batch_size=6 --num_workers=16 --epochs=250 --coco_path=data/COCO --set_cost_kpts=4 --set_cost_ctrs=0.5 --set_cost_deltas=0.5 --set_cost_kpts_class=0.2 --kpts_loss_coef=4 --ctrs_loss_coef=0.5 --deltas_loss_coef=0.5 --kpts_class_loss_coef=0.2 --num_queries=50 --output_dir=experiments/ --lr=5e-5 --lr_backbone=5e-6
If you find this code useful, please cite:
End-to-End Trainable Multi-Instance Pose Estimation with Transformers by Lucas Stoffl, Maxime Vidal and Alexander Mathis.
@article{stoffl2021end,
title={End-to-end trainable multi-instance pose estimation with transformers},
author={Stoffl, Lucas and Vidal, Maxime and Mathis, Alexander},
journal={arXiv preprint arXiv:2103.12115},
year={2021}
}
POET is released under the Apache 2.0 license. Please see the LICENSE file for more information.