We often obtain or download a lot of Document from website internet.After that, we need spent a lot of the time to read multiple documents for understanding the context of these documents. It is taking a lot of time.
This project intended to be used LLM Model for the purpose of assisting the user to analyze and understand the context of these documents and faster access the context of the documents.
This project can divide into 3 parts
- Document Data Extraction Algorithms
- Data Analysis Algorithms for Documentation
- Data Retrieval Algorithms for Documentation (RAG+ LLM Model)
- Document Data Extraction (Unstructure Document Preprocssing)
- Because the input document complexity, include table, image (chart), I will use several AI model like OCR , commputer vision model, Vision transformer , layout transformer, Embedding model to extract and analysis the document content from bank statement.
- Complex layout/Context format Analysis by ML model
- use advance rule base model or Machine learning model :
- group and reorganize the data into a user-friendly format. (no experience to build rule to graoup data)
- Identify common denominators and create headers for each group. (no experence)
- Display only the differences between similar items (e.g., window sizes, owners) as line items below each header.
- Automate the process using AI, enabling the system to self-learn and understand the data structure.
- Extract relevant data from PDFs with different layouts and formats.
- Document Analysis
- Classification the document type
- Documentation content summary
- Intelligence Extract Data and output structure data
-
Vector DataBase use Vector DataBase to store the converted Document context into embedding vector use Vector Database can find document similarity
-
Retrieval augmented generation (RAG) with Multimodal use for query the local Documentation use LangChain to Question and Answer from local Documentation
-
The first version will be used Google Gemini API for LLM Model , later versions will be try different open LLM models (e.g. LLama3, mi)
-
Support Document first version only pdf files format later versions will be words, excel, may be also support image base documents
-
FrontEnd UI first version will be used Streamlit for Frontend UI later versions will be Full stack with Backend Restful API
- use requirements.txt for installation package dependencies
- you can setup virual environment by venv
- add your google api key to .env file for enviroment variables
-
run development version : go to "dev" folder
-
run demo version: go to "demo" folder
- Type command as below for running application
streamlit run apps.py
-
Upload your document for query
-
Type chat prompt message to query the multiple documents