Skip to content

Latest commit

ย 

History

History
118 lines (81 loc) ยท 4.24 KB

README.md

File metadata and controls

118 lines (81 loc) ยท 4.24 KB

AI-powered Enterprise RAG

This project is my side projec of the implementation of an AI-powered Enterprise RAG (Retrieval-augmented generation). It uses a pre-trained model to generate embeddings for books and then uses Elasticsearch to index and search for books by using multi-modal search:

  • traditional text search
  • ๐Ÿงฎ consine similarity search using embeddings (meaning books are recommended based on not just key words but semantic, user preferences, etc. which are all embedded as a vector)
  • I did not choose a vector database as elasticsearch provides vector storage and search capabilities. It is not as good as a vector database but it is good enough for this project. Milvus is a good alternative if you want to use a vector database.
  • For the big firms with more resources, the perfect stack should be: Pytorch + ONNX for model development, FastAPI + Docker for deployment, and RAY + Grafana for lifecycle MLOps with pickle

If you run this project locally after git clone, indexing and searching part only uses a small sample dataset as I want the interviewer (or anyone who is interested in using it) to run the code on their machine and see the results. It takes time to share a parquet file with 1.5M records and its embeddings. The online version is using the full dataset.

If you haven't tried onnx before, please check it out. It is a great way to deploy your models in production if you care about performance in production.

Running Requirements

  • Python3.10.10
  • Docker (>24.0.5 should work)
  • Docker-compose

Installation

# check your python version
# recommend using pyenv to manage python versions
python --V  # should be >= 3.10.10
python -m venv venv
source venv/bin/activate
make install

Runnning Localhost

  1. make onnx: construct onnx model
  2. make elastic-up: start Elasticsearch
  3. make index-books: index books (might need to run this several times as elasticsearch might not be ready)
  4. make run: start FastAPI server

Running Tests

make test

Access Swagger Documentation

The port might be different if you have already running services on port 8080

http://localhost:8080/docs

Access Redocs Documentation

http://localhost:8080/redoc

Deploy app

TODO: Add deployment instructions

Project structure

It uses fastapi-cookiecutter template. The project structure is as follows:

.
โ”œโ”€โ”€ app
โ”‚   โ”œโ”€โ”€ api
โ”‚   โ”œโ”€โ”€ core
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ main.py
โ”‚   โ”œโ”€โ”€ models
โ”‚   โ”œโ”€โ”€ __pycache__
โ”‚   โ”œโ”€โ”€ services
โ”‚   โ””โ”€โ”€ templates
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ Makefile
โ”œโ”€โ”€ ml
โ”‚   โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ features
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ model
โ”‚   โ””โ”€โ”€ __pycache__
โ”œโ”€โ”€ notebooks
โ”‚   โ”œโ”€โ”€ construct_sample_dataset.ipynb
โ”‚   โ””โ”€โ”€ onnx_runtime.ipynb
โ”œโ”€โ”€ poetry.lock
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ search
โ”‚   โ”œโ”€โ”€ books_embeddings.csv
โ”‚   โ”œโ”€โ”€ docker-compose.yml
โ”‚   โ””โ”€โ”€ index_books.py
โ”œโ”€โ”€ tests
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ __pycache__
โ”‚   โ”œโ”€โ”€ test_api.py
โ”‚   โ”œโ”€โ”€ test_elastic_search.py
โ”‚   โ””โ”€โ”€ test_onnx_embedding.py

Data Source

Originally, the data is downloaded from Goodreads Book Graph Datasets. The author also provides the code to download the data.

I downloaded the data and uploaded it to my Google Cloud Storage bucket. Please let me know if you found above links are broken and I will provide you with the data.

There are many tables in the dataset, but we are only interested in the following tables:

  • books: detailed meta-data about 2.36M books
  • reviews: Complete 15.7m reviews (~5g) and 15M records with detailed review text