Simple RAG (Retrieval Augmented Generation) using Vertex AI Generative AI (PaLM 2 model) and Qdrant Vector Database, presented at Lyon Data Science meetup
- The project was developed and tested using Python 3.10 on MacOS
python3.10 -m venv ./venv
source ./venv/bin/activate
pip install -r requirements.txt
- Start a local Qdrant instance
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
This repository contains 2 scripts:
rag_indexing.py
that index the content ofdata/knowledge_base.json
(list of questions / responses from WikiQA dataset) in a Qdrant vector database.- Questions are embedded using Vertex AI Generative model (Gecko).
- Qdrant URL can be configured in
constant.py
(default localhost). - You can define the port to use (HTTP and GRPC) when creating the Qdrant client (default are 6333 for HTTP, 6334 for GRPC).
rag_inference.py
that performs the following steps:- Ask a question to the user.
- Embed the question using the same model as during indexing.
- Retrieve the nearest (semantic) questions in the database.
- Build a context for the user question, using the responses to the questions retrieved at the previous step.
- Build a prompt and ask a LLM (here Bison) for the response.
Note that the scripts can be easily adapted to use another Vector DB / LLM (GPT for example).