facebookresearch / mmf Star 5.4k Code Issues Pull requests A modular framework for vision & language multimodal research from Facebook AI Research (FAIR) deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes Updated May 25, 2024 Python
yashkant / sam-textvqa Star 62 Code Issues Pull requests Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020. language vision eccv textvqa Updated Sep 15, 2021 Python
phiyodr / vqaloader Star 6 Code Issues Pull requests PyTorch DataLoader for many VQA datasets pytorch vqa dataloader gqa textvqa vqav2 Updated Jan 10, 2023 Python
soonchangAI / LFPR Star 0 Code Issues Pull requests [PRL 2024] This is the code repo for our label-free pruning and retraining technique for autoregressive Text-VQA Transformers (TAP, TAP†). transformer textvqa pruning-algorithms Updated May 22, 2024 Python