antoyang / TubeDETR Star 162 Code Issues Pull requests [CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers video-understanding multimodal-learning vision-and-language visual-grounding spatio-temporal-video-grounding stvg vidstg hc-stvg Updated Sep 24, 2023 Python