You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is best practice to initialize ragbuilder/ragapplication vars and add loaders/buid into lance
create a new rag builder and ragapplication var from it which is public as part of server initialization
I can add additional loaders via an api end point, set Lance and build each time
Should i be setting vector db once, then just calling builder after add loader each time?
What is the best way to build up incremental data in the session to answer questions reliably across web pages and PDF's mixed in?
Problem
when i add urls it seems to work, when i re-add they kind of work answer questions but i notice it becomes much less reliable. I tried adding 2 urls' then a PDF. initially the url's worked, answering questions, then when i added a PDF, that worked answered questions but the original two web pages started getting a lot more "I don't know the answer to the question" like it become much dumber.
Is this because the way I'm adding / building in Lance below or is it because Lance perhaps isn't the best semantic search rag I should be using as more data is added?
As for your specific question, I think it's a combination of multiple things. I am going to assume there is a cache, so the quality of the response depends on the embedding model and LLM choice. The vectorDB choice itself is only a little relevant as most vectorDBs have similar performance these days.
One thing you should look at is the sources field in the query response. This will contain references to the loaded info pieces that were used to form the response (essentially the items that were picked up from the vectorDB and sent to the LLM)
Another tool that can help are the logs. The library emits a lot of debug logs which are hidden by default. These logs can give you a lot of info. To see them, set the env vairable DEBUG=embedjs:*
Thanks. I can investigate this but the 2 web pages i originally loaded seemed to be less reliable as i loaded a pdf. I will play and follow your diagram. Thanks for your detailed response.
I really appreciate this and i would buy you a pizza or round of coffees anytime.
I like the way you have structured this project. It is very cool. We all need to get behind you and support it. Regards.
What is best practice to initialize ragbuilder/ragapplication vars and add loaders/buid into lance
What is the best way to build up incremental data in the session to answer questions reliably across web pages and PDF's mixed in?
Problem
when i add urls it seems to work, when i re-add they kind of work answer questions but i notice it becomes much less reliable. I tried adding 2 urls' then a PDF. initially the url's worked, answering questions, then when i added a PDF, that worked answered questions but the original two web pages started getting a lot more "I don't know the answer to the question" like it become much dumber.
Is this because the way I'm adding / building in Lance below or is it because Lance perhaps isn't the best semantic search rag I should be using as more data is added?
Code
initialization
let ragApplication;
let ragApplicationBuilder;
router.get('/test', async (req, res) => {
ragApplication = await new RAGApplicationBuilder()
.setTemperature(0)
res.status(200).send("test Done");
})
adding pages
router.post('/addURL', async (req, res) => { // Changed to POST
const { url } = req.body; // Extract URL from request body
if (!url) {
return res.status(400).send('URL is required');
}
try {
ragApplicationBuilder
.addLoader(new WebLoader({ url: url }))
ragApplication = await ragApplicationBuilder
.setVectorDb(new LanceDb({ path: './db' }))
.build();
res.status(200).send("updated web page");
} catch (error) {
console.error('Error adding URL:', error);
return res.status(500).send('Internal Server Error');
}
});
The text was updated successfully, but these errors were encountered: