We use the popular MMDetection toolbox for experiments on the MS-COCO dataset with the pre-trained ResNet50, MobileNetV2 (1.0×) and ConvNeXt-Tiny models as the backbones for the detector. We select the mainstream Faster RCNN and Mask R-CNN detectors with Feature Pyramid Networks as the necks to build the basic object detection systems.
Please follow Swin-Transformer-Object-Detection on how to prepare the environment and the dataset. Then attach our code to the origin project and modify the config files according to your own path to the pre-trained models and directories to save logs and models.
To train a detector with pre-trained models as backbone:
bash tools/dist_train.sh {path to config file} {number of gpus}
To evaluate a fine-tuned model:
bash tools/dist_test.sh {path to config file} {path to fine-tuned model} {number of gpus} --eval bbox segm --show
Backbones | Detectors | box AP | mask AP | Config | Google Drive | Baidu Drive |
---|---|---|---|---|---|---|
ResNet50 | Mask R-CNN | 39.6 | 36.4 | config | model | model |
+ KW (1×) | Mask R-CNN | 41.8 | 38.4 | config | model | model |
+ KW (4×) | Mask R-CNN | 42.4 | 38.9 | config | model | model |
MobileNetV2 (1.0×) | Mask R-CNN | 33.8 | 31.7 | config | model | model |
+ KW (1×) | Mask R-CNN | 36.4 | 33.7 | config | model | model |
+ KW (4×) | Mask R-CNN | 38.0 | 34.9 | config | model | model |
ConvNeXt-Tiny | Mask R-CNN | 43.4 | 39.7 | config | model | model |
+ KW (4×) | Mask R-CNN | 44.7 | 40.6 | config | model | model |