Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin
(* equal contribution)
Accepted to ICCV 2023 (Oral)
- mmdetection: https://github.com/V3Det/mmdetection-V3Det/tree/main/configs/v3det
- Detectron2: https://github.com/V3Det/Detectron2-V3Det
- Detectron2: https://github.com/V3Det/Detectron2-V3Det
The data includes a Train Set, a Val Set, and a Test Set, comprising 13,204 categories.
Split | Images | BBoxes |
---|---|---|
Train Set | 183,354 | 1,357,377 |
Val Set | 29,821 | 220,429 |
Test Set | 29,863 | 219,012 |
Train Set OVD (Base Class) | 132,437 | 836,203 |
The 13,204 categories are split into 6709 Base Class and 6495 Novel Class for OVD tasks. For each of the 13,204 categories, we prepare an exemplar image and detailed descriptions from various resources (human experts, ChatGPT, GPT4, and GPT4V).
Base Class | Novel Class | All Class |
---|---|---|
6709 | 6495 | 13204 |
The Train Set OVD (Base Class) is a subset of train set that only keeps the annotations of base classes, which is prepared for OVD (Open-Vocubalary Detection) tasks. Images without any annotations after filtering out novel annotations are removed. It is perpared for OVD (Open-Vocubalary Detection) tasks.
Split | Images | BBoxes |
---|---|---|
Train Set | 183,354 | 1,357,377 |
Train Set OVD (Base Class) | 132,437 | 836,203 |
The data organization is:
V3Det/
images/
<category_node>/
|────<image_name>.png
...
...
test/
|────<image_name>.png
...
exemplar_images/
|────<category_id>.jpg
...
annotations/
|────v3det_2023_v1_category_tree.json # Category tree
|────category_name_13204_v3det_2023_v1.txt # Category name
|────v3det_2023_v1_train.json # Train set
|────v3det_2023_v1_train_ovd_base.json # Open vocabulary detection train set
|────v3det_2023_v1_val.json # Validation set
|────v3det_2023_v1_test_image_info.json # Image information of test set
The annotation files are provided in dictionary format and contain the keywords "images," "categories," and "annotations."
- images : store a list containing image information, where each element is a dictionary representing an image.
file_name # The relative image path, eg. images/n07745046/21_371_29405651261_633d076053_c.jpg.
height # The height of the image
width # The width of the image
id # Unique identifier of the image.
- categories : store a list containing category information, where each element is a dictionary representing a category.
name # English name of the category.
name_zh # Chinese name of the category.
cat_info # The format for the description information of categories is a list.
cat_info_gpt # The format for the description information of categories generated by ChatGPT is a list.
cat_info_gpt4 # The format for the description information of categories generated by GPT4.
cat_info_gpt4v # The format for the description information of categories generated by GPT4-V.
novel # For open-vocabulary detection, indicate whether the current category belongs to the 'novel' category.
id # Unique identifier of the category.
exemplar_image # Exemplar image of the category.
- annotations : store a list containing annotation information, where each element is a dictionary representing a bounding box annotation.
image_id # The unique identifier of the image where the bounding box is located.
category_id # The unique identifier of the category corresponding to the bounding box.
bbox # The coordinates of the bounding box, in the format [x, y, w, h], representing the top-left corner coordinates and the width and height of the box.
iscrowd # Whether the bounding box is a crowd box.
area # The area of the bounding box
- The category tree stores information about dataset category mappings and relationships in dictionary format.
categoryid2treeid # Unique identifier of node in the category tree corresponding to the category identifier in dataset
id2name # English name corresponding to each node in the category tree
id2name_zh # Chinese name corresponding to each node in the category tree
id2desc # English description corresponding to each node in the category tree
id2desc_zh # Chinese description corresponding to each node in the category tree
id2synonym_list # List of synonyms corresponding to each node in the category tree
id2center_synonym # Center synonym corresponding to each node in the category tree
father2child # All direct child categories corresponding to each node in the category tree
child2father # All direct parent categories corresponding to each node in the category tree
ancestor2descendant # All descendant nodes corresponding to each node in the category tree
descendant2ancestor # All ancestor nodes corresponding to each node in the category tree
- Run the command to crawl the train and val images. By default, the images will be stored in the './V3Det/' directory.
python v3det_image_download.py
- If you want to change the storage location, you can specify the desired folder by adding the option '--output_folder' when executing the script.
python v3det_image_download.py --output_folder our_folder
- Run the command to crawl the test images.
python v3det_test_image_download.py [--output_folder our_folder]
- Run the command to crawl the exemplar images.
python v3det_exemplar_image_download.py [--output_folder our_folder]
- Run the command and then select dataset path
path/to/V3Det
to visualize the category tree.
python v3det_visualize_tree.py
Please refer to the TreeUI Operation Guide for more information.
- We provide evaluation code here. To evaluate the model, you need
pip install pycocotools, tqdm
pip install openmim
mim install mmengine
Please format your detection result into COCO JSON format
Run the python script:python eval_v3det.py dt_json_path
-
V3Det Images: Around 90% images in V3Det were selected from the Bamboo Dataset, sourced from the Flickr website. The remaining 10% were directly crawled from the Flickr. We do not own the copyright of the images. Use of the images must abide by the Flickr Terms of Use. We only provide lists of image URLs without redistribution.
-
V3Det Annotations: The V3Det annotations, the category relationship tree, and related tools are licensed under a Creative Commons Attribution 4.0 License (allow commercial use).
-
Codebase: mmdetection-V3Det License and Detectron2-V3Det License
@inproceedings{wang2023v3det,
title = {V3Det: Vast Vocabulary Visual Detection Dataset},
author = {Wang, Jiaqi and Zhang, Pan and Chu, Tao and Cao, Yuhang and Zhou, Yujie and Wu, Tong and Wang, Bin and He, Conghui and Lin, Dahua},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023}
}