Implementation of a local RAG system from scratch that processes PDF documents, generates embeddings, and uses a language model to answer queries based on the document's content.
This project implements a Retrieval-Augmented Generation (RAG) system that can:
- Process PDF documents and extract text content.
- Generate embeddings for text chunks using sentence transformers.
- Perform semantic search to find relevant context for queries.
- Generate responses using a language model .
├── local_RAG/
│ │
│ ├── components/
│ │ ├── Embeddings.py # Generate and save Embeddings in a CSV file.
│ │ ├── LLM.py # Leverage the use of LLM.
│ │ ├── PDF_Processing.py # Process the PDF to generate the embeddings from.
│ │ ├── Prompter.py # Create a dynamic prompt to pass in the LLM.
│ │ ├── RAG.py # Takes all the components and make sequential pipeline.
│ │ └── Semantic_search.py # Search semantically according to the query from the embeddings.
│ │
│ ├── Notebooks/
│ │ ├── Local_RAG.ipynb
│ │
│ ├── main.py # Example usage for the whole pipeline.
│
├── Simple_RAG/
│ │
│ ├── simple_rag.py # Simplest RAG pipeline using groq.
│
├── requirements.txt
│
- Extracts text from PDF documents.
- Splits text into sentences and chunks.
- Filters out irrelevant content.
- Adds metadata such as character count, word count, and token count.
- Uses SentenceTransformer to generate embeddings for text chunks.
- Saves embeddings to a CSV file for later use.
- Supports GPU acceleration when available.
- Performs similarity search on pre-computed embeddings.
- Retrieves the most relevant text chunks for a given query.
- Uses dot product similarity for ranking results.
- Creates structured prompts using retrieved context.
- Includes example-based formatting for consistent responses.
- Combines context and query in a template format.
- Interfaces with the Falcon-3B-Instruct model.
- Handles tokenization and text generation.
- Supports GPU acceleration for faster .inference.
- Coordinates all components into a unified system.
- Manages the workflow from query to response.
- Handles embedding generation and storage.
- Clone the repository:
git clone https://github.com/BEASTBOYJAY/Local_RAG.git
cd Local_RAG
- Install the required packages:
pip install -r requirements.txt
from components.RAG import Local_RAG
# Initialize the RAG system with your PDF
pdf_path = "path/to/your/document.pdf"
local_rag = Local_RAG(pdf_path=pdf_path)
# Ask a question
query = "What is the main topic of the document?"
response = local_rag.run(query=query)
print(response)
- PDF Text Extraction: Intelligent text extraction with formatting preservation
- Smart Chunking: Sentence-aware text chunking for better context preservation
- GPU Acceleration: Automatic GPU usage when available for faster processing
- Embedding Storage: Saves embeddings to avoid recomputation
- Customizable Context: Adjustable number of context chunks for responses
- Example-Based Prompting: Structured prompts with examples for better responses
The system uses several default configurations that can be modified:
- Default embedding model: "all-mpnet-base-v2"
- Default language model: "tiiuae/Falcon3-3B-Instruct"
- Default chunk size: 10 sentences
- Default number of relevant chunks: 5