vision-language-model

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Updated Apr 23, 2024
C++

llm-jp / awesome-japanese-llm

Star

日本語LLMまとめ - Overview of Japanese LLMs

Updated Apr 28, 2024

PKU-YuanGroup / Chat-UniVi

Star

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

video-understanding image-understanding large-language-models vision-language-model

Updated Apr 12, 2024
Python

mbzuai-oryx / groundingLMM

Star

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

vision-and-language lmm foundation-models vision-language-model llm-agent

Updated Apr 15, 2024
Python

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Apr 15, 2024
Python

AlaaLab / InstructCV

Star

[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

generative-model text-to-image multi-task-learning diffusion-models stable-diffusion vision-language-model

Updated Apr 27, 2024
Python

SunzeY / AlphaCLIP

Star

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

machine-learning deep-learning vision-and-language vision-language vision-transformer vision-language-model

Updated Mar 4, 2024
Jupyter Notebook

huangwl18 / VoxPoser

Star

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

robotics motion-planning robotic-manipulation embodied-ai foundation-models large-language-models vision-language-model

Updated Nov 9, 2023
Python

OpenGVLab / Multi-Modality-Arena

Star

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot vqa gradio multi-modality large-language-models llms chatgpt vision-language-model

Updated Apr 21, 2024
Python

PJLab-ADG / awesome-knowledge-driven-AD

Star

A curated list of awesome knowledge-driven autonomous driving (continually updated)

autonomous-driving knowledge-driven large-language-models vision-language-model

Updated Apr 12, 2024

VPGTrans / VPGTrans

Star

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

llm vision-language-model large-scale-language-modeling vl-llm

Updated Oct 13, 2023
Python

Improve this page

Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-model

Here are 95 public repositories matching this topic...

haotian-liu / LLaVA

QwenLM / Qwen-VL

dvlab-research / MGM

jingyi0000 / VLM_survey

InternLM / InternLM-XComposer

deepseek-ai / DeepSeek-VL

NVlabs / prismer

OpenGVLab / InternVL

roboflow / multimodal-maestro

AlibabaResearch / AdvancedLiterateMachinery

llm-jp / awesome-japanese-llm

PKU-YuanGroup / Chat-UniVi

mbzuai-oryx / groundingLMM

BAAI-Agents / Cradle

AlaaLab / InstructCV

SunzeY / AlphaCLIP

huangwl18 / VoxPoser

OpenGVLab / Multi-Modality-Arena

PJLab-ADG / awesome-knowledge-driven-AD

VPGTrans / VPGTrans

Improve this page

Add this topic to your repo