The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
language
multi-modal
image-transformer
vision-transformer
video-language
efficiency-transformer
video-transformer
mlp-mixer
transformer-readling-list
multi-modal-cvpr2021
-
Updated
Aug 27, 2022