Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
-
Updated
Sep 18, 2016 - Python
Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
Implementation for the Independent Multimodal Background Subtraction based on a paper written by Bloisi and Iocchi
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
End-to-end multimodal emotion and gender recognition with dynamic weights of joint loss
HongminWu.github.io
Attention Modeling for Image Captioning described in 'Show, Attend and Tell'
Torch code for Visual Question Generation
my solution with 0.67 accuracy
Unsupervised localization of Text in Images
A detailed description on how to extract and align text, audio, and video features at word-level.
Unsupervised specificity-guided optimization of Image Captioning models to encourage meaningful diversity in the generated captions.
Visual Question Answering System
VoiceGAN - Hallucinating Faces from Voices
HW for CS229 Machine Learning
Dataset for Visually Indicated Sound Generation by Perceptually Optimized Classification
Package for Multimodal Autoencoders in TensorFlow / Keras
PyTorch Implementation of HUSE: Hierarchical Universal Semantic Embeddings ( https://arxiv.org/pdf/1911.05978.pdf )
Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"
4th place (top 1%) solution for Shopee Code League 2020 - Product Detection
Add a description, image, and links to the multimodal-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-learning topic, visit your repo's landing page and select "manage topics."