RoostAI is an intelligent chatbot specifically designed to provide information about the University of South Carolina (USC). It uses advanced RAG (Retrieval-Augmented Generation) techniques to deliver accurate, context-aware responses to university-related queries. Check out a presentation of the application's functionality
roostai/
├── back_end/ # Core chatbot implementation
├── front_end/ # User interface components
├── web_scraping/ # Data collection utilities
├── scripts/ # Utility scripts for setup and maintenance
└── data/ # Vector database and related data
eval/ # Evaluation scripts and tools
-
Prerequisites
- Python 3.9+
- Poetry for dependency management
- HuggingFace API key for LLM access
- Optional: Might need OpenAI, Anthropic, and Google API keys if considering comprehensive evaluations
-
Installation
poetry install
- Environment Setup
Add the following environment variables to your
.env
file:
# HuggingFace API Key
HF_API_KEY=your_huggingface_api_key
# Optional: OpenAI, Anthropic, and Google API Keys
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
GOOGLE_API_KEY=your_google_api_key
All commands should be run from the project root directory.
# Web Scraping Pipeline
poetry run python roostai/web_scraping/web_scraper.py
poetry run python roostai/web_scraping/collect_html.py
poetry run python roostai/web_scraping/extract.py
poetry run python roostai/web_scraping/chunker.py
# Database Creation
poetry run python roostai/scripts/data_ingestion.py
# Interactive Mode
poetry run python roostai/back_end/main.py
# With Output Logging
poetry run python roostai/back_end/main.py 2>&1 | tee roostai/back_end/dry-run.out
# FAQ Response Generation
poetry run python eval/get_llm_responses.py
# RAG Response Evaluation
poetry run python eval/ragas_evaluation/get_rag_response.py
- RAG-based retrieval system
- Vector database management
- Query processing and response generation
- Quality assessment and reranking
- User interface for chatbot interaction using Streamlit
- Survey functionality for user feedback
- Response visualization
- Web scraping utilities
- Text chunking and preprocessing
- Data ingestion pipeline
This project is licensed under the BSD-2 Clause License. For more information, see the LICENSE
file.