New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.4.0 Proposal Items #318

Open

davidgxue opened this issue Mar 7, 2024 · 0 comments

Collaborator

davidgxue commented Mar 7, 2024 •

edited

Items to explore/implement

HTML webpages sometimes have embedded hyperlinks e.g. mylife

We currently remove these links when before chunking/vectorizing/inserting the docs into vector db
We can attempt to keep these so that the bot can answer with links if needed

Astro Forum docs sometimes have bad formatting during ingestion

This issue was addressed for most of the other data sources, but forum docs has also shown this issue here and there

Re-increase document length to higher length from current 2.5k tokens per doc to previous higher number 4k

This was done as cost reduction which may have mildly hindered retrieval performance (but was not exactly observed)
With new GPT 4 turbo model that has 6X cheaper cost for input tokens, this is no longer a major concern

Build out `evaluated_rag` DAG to use a judge to score improvements/degradation from the previous answer/reference answer

Add quantitative measure as a judge such as cosine similarity distance from reference answer.
Add LLM as a judge

Explore/experiment with top k, alpha and other parameters used for reranking and hybrid search

Explore adding additional property in vector db

Add property/data on each document chunk with the title of the original page or something similar so that smaller chunks can retain semantic meaning better to the overall topic of the original full document

Train and add off-topic discussion text classifier (could be before QA or after QA)

If on-topic is true and off-topic is false, we want to be more lenient and allow more on-topic even if it maybe loosely related AKA allow more false positives (higher recall score vs precision)
We can tentatively weight recall to be 5x higher than precision. So for metric 𝐹𝛽, the value of 𝛽 = 5.

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment