Skip to content

Latest commit

 

History

History

maskformer

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Reference

Cheng, Bowen, Alex Schwing, and Alexander Kirillov. "Per-pixel classification is not all you need for semantic segmentation." Advances in Neural Information Processing Systems 34 (2021): 17864-17875.

Performance

ADE20k

Model Backbone Resolution Training Iters mIoU mIoU (flip) mIoU (ms+flip) Links
Maskformer-tiny SwinTransformer 512x512 160000 47.93 - - model | log | vdl
Maskformer-small SwinTransformer 512x512 160000 50.4 - - model | log | vdl
  • Maskformer support different network setting including tiny, small, base and large. The training result of base and large is not provided, but it should be consistent with the paper

  • Maskformer-Base and Maskformer-Large will be evaled with multi-scale and flip as the original codebase .

  • Please use cuda 11.2 rather than cuda 10.2 to prevent computation bugs.