Skip to content

tisu19021997/bad-mm-video-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Video/Youtube QA

Youtube or Video -> Transcription + Frames -> Text embeddings + Image embeddings -> VectorDB -> RAG with image + text.

LLM: Gemini Vision Pro Text embedding: BAAI/bge-large-en-v1.5 Image embedding: OpenAI/CLIP or something. STT: openai/whisper-large-v3

Demo: gradio-app.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published