Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
-
Updated
Mar 19, 2024 - Python
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "
Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
Research Code for Multimodal-Cognition Team in Ant Group
Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.
LLaVA base model for use with Autodistill.
Official repository of the paper: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics
[ACL ARR Under Review] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"
Streamlit app to chat with images using Multi-modal LLMs.
Add a description, image, and links to the multimodal-llm topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-llm topic, visit your repo's landing page and select "manage topics."