AI-powered Enterprise RAG

This project is my side projec of the implementation of an AI-powered Enterprise RAG (Retrieval-augmented generation). It uses a pre-trained model to generate embeddings for books and then uses Elasticsearch to index and search for books by using multi-modal search:

traditional text search
🧮 consine similarity search using embeddings (meaning books are recommended based on not just key words but semantic, user preferences, etc. which are all embedded as a vector)
I did not choose a vector database as elasticsearch provides vector storage and search capabilities. It is not as good as a vector database but it is good enough for this project. Milvus is a good alternative if you want to use a vector database.
For the big firms with more resources, the perfect stack should be: Pytorch + ONNX for model development, FastAPI + Docker for deployment, and RAY + Grafana for lifecycle MLOps with pickle

If you run this project locally after git clone, indexing and searching part only uses a small sample dataset as I want the interviewer (or anyone who is interested in using it) to run the code on their machine and see the results. It takes time to share a parquet file with 1.5M records and its embeddings. The online version is using the full dataset.

If you haven't tried onnx before, please check it out. It is a great way to deploy your models in production if you care about performance in production.

Running Requirements

Python3.10.10
Docker (>24.0.5 should work)
Docker-compose

Installation

# check your python version
# recommend using pyenv to manage python versions
python --V  # should be >= 3.10.10
python -m venv venv
source venv/bin/activate
make install

Runnning Localhost

make onnx: construct onnx model
make elastic-up: start Elasticsearch
make index-books: index books (might need to run this several times as elasticsearch might not be ready)
make run: start FastAPI server

Running Tests

make test

Access Swagger Documentation

The port might be different if you have already running services on port 8080

http://localhost:8080/docs

Access Redocs Documentation

http://localhost:8080/redoc

Deploy app

TODO: Add deployment instructions

Project structure

It uses fastapi-cookiecutter template. The project structure is as follows:

.
├── app
│   ├── api
│   ├── core
│   ├── __init__.py
│   ├── main.py
│   ├── models
│   ├── __pycache__
│   ├── services
│   └── templates
├── docker-compose.yml
├── Dockerfile
├── Makefile
├── ml
│   ├── data
│   ├── features
│   ├── __init__.py
│   ├── model
│   └── __pycache__
├── notebooks
│   ├── construct_sample_dataset.ipynb
│   └── onnx_runtime.ipynb
├── poetry.lock
├── pyproject.toml
├── README.md
├── search
│   ├── books_embeddings.csv
│   ├── docker-compose.yml
│   └── index_books.py
├── tests
│   ├── __init__.py
│   ├── __pycache__
│   ├── test_api.py
│   ├── test_elastic_search.py
│   └── test_onnx_embedding.py

Data Source

Originally, the data is downloaded from Goodreads Book Graph Datasets. The author also provides the code to download the data.

I downloaded the data and uploaded it to my Google Cloud Storage bucket. Please let me know if you found above links are broken and I will provide you with the data.

There are many tables in the dataset, but we are only interested in the following tables:

books: detailed meta-data about 2.36M books
reviews: Complete 15.7m reviews (~5g) and 15M records with detailed review text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AI-powered Enterprise RAG

Running Requirements

Installation

Runnning Localhost

Running Tests

Access Swagger Documentation

Access Redocs Documentation

Deploy app

Project structure

Data Source

Files

README.md

Latest commit

History

README.md

File metadata and controls

AI-powered Enterprise RAG

Running Requirements

Installation

Runnning Localhost

Running Tests

Access Swagger Documentation

Access Redocs Documentation

Deploy app

Project structure

Data Source