Skip to content

GoMate:RAG Framework within Reliable input,Trusted output

Notifications You must be signed in to change notification settings

AldousShou/GoMate

 
 

Repository files navigation

GoMate

可配置的模块化RAG框架。

Python workflow status codecov pydocstyle PEP8

🔥Gomate 简介

GoMate是一款配置化模块化的Retrieval-Augmented Generation (RAG) 框架,旨在提供可靠的输入与可信的输出,确保用户在检索问答场景中能够获得高质量且可信赖的结果。

GoMate框架的设计核心在于其高度的可配置性和模块化,使得用户可以根据具体需求灵活调整和优化各个组件,以满足各种应用场景的要求。

🔨Gomate框架

framework.png

✨主要特色

“Reliable input,Trusted output”

可靠的输入,可信的输出

🏗️ 更新记录

  • RAPTOR:递归树检索器实现
  • 支持多种文件解析并且模块化目前支持解析的文件类型包括:text,docx,ppt,excel,html,pdf,md
  • 优化了DenseRetriever,支持索引构建,增量追加以及索引保存,保存内容包括文档、向量以及索引
  • 添加ReRank的BGE排序、Rewriter的HyDE
  • 添加Judge的BgeJudge,判断文章是否有用 20240711

🚀快速上手

安装环境

pip install -r requirements.txt

1 文档解析

目前支持解析的文件类型包括:text,docx,ppt,excel,html,pdf,md

from gomate.modules.document.common_parser import CommonParser

parser = CommonParser()
document_path = 'docs/夏至各地习俗.docx'
chunks = parser.parse(document_path)
print(chunks)

2 构建检索器

import pandas as pd
from tqdm import tqdm

from gomate.modules.retrieval.dense_retriever import DenseRetriever, DenseRetrieverConfig

retriever_config = DenseRetrieverConfig(
    model_name_or_path="bge-large-zh-v1.5",
    dim=1024,
    index_dir='dense_cache'
)
config_info = retriever_config.log_config()
print(config_info)

retriever = DenseRetriever(config=retriever_config)

data = pd.read_json('docs/zh_refine.json', lines=True)[:5]
print(data)
print(data.columns)

retriever.build_from_texts(documents)

保存索引

retriever.save_index()

3 检索文档

result = retriever.retrieve("RCEP具体包括哪些国家")
print(result)

4 大模型问答

from gomate.modules.generator.llm import GLMChat
chat = GLMChat(path='THUDM/chatglm3-6b')
print(chat.chat(question, [], content))

5 添加文档

for documents in tqdm(data['positive'], total=len(data)):
    for document in documents:
        retriever.add_text(document)
for documents in tqdm(data['negative'], total=len(data)):
    for document in documents:
        retriever.add_text(document)

🔧定制化RAG

构建自定义的RAG应用

import os

from gomate.modules.document.common_parser import CommonParser
from gomate.modules.generator.llm import GLMChat
from gomate.modules.reranker.bge_reranker import BgeReranker
from gomate.modules.retrieval.dense_retriever import DenseRetriever



class RagApplication():
    def __init__(self, config):
        pass

    def init_vector_store(self):
        pass

    def load_vector_store(self):
        pass

    def add_document(self, file_path):
        pass

    def chat(self, question: str = '', topk: int = 5):
        pass

模块可见rag.py

🌐体验RAG效果

可以配置本地模型路径

# 修改成自己的配置!!!
app_config = ApplicationConfig()
app_config.docs_path = "./docs/"
app_config.llm_model_path = "/data/users/searchgpt/pretrained_models/chatglm3-6b/"

retriever_config = DenseRetrieverConfig(
    model_name_or_path="/data/users/searchgpt/pretrained_models/bge-large-zh-v1.5",
    dim=1024,
    index_dir='/data/users/searchgpt/yq/GoMate/examples/retrievers/dense_cache'
)
rerank_config = BgeRerankerConfig(
    model_name_or_path="/data/users/searchgpt/pretrained_models/bge-reranker-large"
)

app_config.retriever_config = retriever_config
app_config.rerank_config = rerank_config
application = RagApplication(app_config)
application.init_vector_store()
python app.py

浏览器访问:127.0.0.1:7860 demo.png

app后台日志:

app_logging.png

⭐️ Star History

Star History Chart

研究与开发团队

本项目由网络数据科学与技术重点实验室GoMate团队完成,团队指导老师为郭嘉丰、范意兴研究员。

GoMate技术交流群可添加:1185918903(微信)

About

GoMate:RAG Framework within Reliable input,Trusted output

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Makefile 0.3%