Video Foundation Models & Data for Multimodal Understanding
-
Updated
Jun 4, 2024 - Python
Video Foundation Models & Data for Multimodal Understanding
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.
500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型(TensorFlow2.0)。
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
Awesome papers & datasets specifically focused on long-term videos.
SoccerAct10 is a dataset which contains 10 different soccer actions. This dataset was developed using the videos from YouTube.
Tools for loading video dataset and transforms on video in pytorch. You can directly load video files without preprocessing.
Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera like position.
🌱 Starter kit for working with the EPIC-KITCHENS-55 dataset for action recognition or anticipation
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)
Official repository for the paper titled "Bitstream-corrupted Video Recovery: A Novel Benchmark Dataset and Method", accepted by NeurIPS 2023 Dataset and Benchmark Track
Keras Implementation of Video Swin Transformers for 3D Video Modeling
[AAAI 2023] AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work
Official This-Is-My Dataset published in CVPR 2023
[NeurIPS'22] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Improving Transfer Learning with a Dual Image and Video Transformer for Multi-label Movie Trailer Genre Classification
The repository contains the code for extracting image and mask from a video segmentation dataset by using the OpenCV library in the Python programming language.
Dataset repository of "MetaVD: A Meta Video Dataset for enhancing human action recognition datasets."
LIVE-YT-HFR Video Quality Assessment Database
Add a description, image, and links to the video-dataset topic page so that developers can more easily learn about it.
To associate your repository with the video-dataset topic, visit your repo's landing page and select "manage topics."