RAG Embedding Evaluation

Overview

This project benchmark the performance of different text process techniques by evaluate it's ability to differentiate related and unrelated queries from document during embedding search.

Currently tested document process techniques:

Document side:

Summary: MultiVector Retriever - Langchain
Split Document: MultiVector Retriever - Langchain
Hypothesis Query Generation: MultiVector Retriever - Langchain

Query side:

Fuse Query: Forget RAG, the Future is RAG-Fusion - Adrian H. Raudaschl
Predict Answer: Three LLM tricks that boosted embeddings search accuracy by 37% - Alistair Pullen

Evaluation Method

For each case, multiple embeddings will be generated with each doc side and query side methods. Then cosine similarity is calculated accross 2 side embeddings.
Then Max similarity for each query of each <doc, query> side method pair is accumulated as a score for this 2 methods pair for this.
Then we calculate the average score for each positive query and negative query. And get the diff of average positive query to doc similarity to average negative query to doc similarity as the overall score for this <doc, query> method pair.
The score of origin doc embedding with origin query embedding, showed as <direct, direct> is reduced from each score. Since approach is expected to better differentiate related and unrelated queries.

How to use:

poetry install
poetry run streamlit run
add your own azure openai key and endpoint in streamlit app
select dataset
click run to run evaluation process. This will take a while. completed result will be store in results/ folder
click show cases to reveal current dataset
click show result to reveal previous eval result

Add or modify dataset

Each dataset is added to cases folder as a json file. The json file should contain a list of dict with following format:

{
    "topic": "topic name should be unique for each case",
    "content": "document content",
    "positive_queries": {
            "pq1 name": "query content",
            "pq2 name": "query content"
        },
    "negative_queries": {
        "nq1 name": "query content"
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cases		cases
codedog_sdk		codedog_sdk
docs		docs
embedding_benchmark		embedding_benchmark
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cases

cases

codedog_sdk

codedog_sdk

docs

docs

embedding_benchmark

embedding_benchmark

results

results

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

poetry.lock

poetry.lock

poetry.toml

poetry.toml

pyproject.toml

pyproject.toml

Repository files navigation

RAG Embedding Evaluation

Overview

Evaluation Method

How to use:

Add or modify dataset

About

Releases

Packages

Languages

License

codedog-ai/rag-embedding-eval

Folders and files

Latest commit

History

Repository files navigation

RAG Embedding Evaluation

Overview

Evaluation Method

How to use:

Add or modify dataset

About

Resources

License

Stars

Watchers

Forks

Languages