-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webLoader loads a previously registered URL only #153
Comments
Loaders are cumulative. When a query is made, the knowledge from all previously loaded data is searched for what is most relevant to the query (using embeddings and vector databases). This data is sent to the LLM and the source of that data is sent back in the sources array. Adding a new loader does not delete data from other loaders. You can have many several hundreds of loaders. For example, if you were to build an Elon Musk app - you would add multiple sources of data from different web pages and youtube videos about Elon Musk. If you want to clear all loaded data, then you can invoke the delete methods available within the |
Understood. Thank you for the clarification. However, something is not working correctly: see my code and output below: const ragApplication = await new RAGApplicationBuilder()
.setModel(new OpenAi({model: "gpt-4o-mini",}))
.setEmbeddingModel(new OpenAiEmbeddings())
.setVectorDb(new PineconeDb({
projectName: 'medicalinfo',
namespace: 'ns1',
indexSpec: {
serverless: {
cloud: 'aws',
environment: 'us-east-1'
},
},
}))
.setQueryTemplate("Only include information provided to you, do not make up answers. If the information is not available, state that you do not know.")
.build();
await ragApplication.addLoader(new WebLoader({ urlOrContent: 'https://platform.openai.com/docs/guides/prompt-engineering' }));
const res = await ragApplication.query('How do I write a GPT Prompt?') The output I receive is: {
id: 'f8333958-9e9e-469d-a856-87753184f977',
timestamp: 2024-11-04T00:20:06.024Z,
content: 'I do not know.',
actor: 'AI',
sources: [
{
source: 'https://www.forbes.com/profile/elon-musk',
loaderId: 'WebLoader_8cf46026cabf9b05394a2658bd1fe890'
}
],
tokenUse: { inputTokens: 2012, outputTokens: 5 }
} -Why is it referencing only the source as |
I just double checked my Pinecone DB and it appears the application has stopped uploading vectors to the DB. This is likely the issue. |
I will take a look at it. This should work from the sample code you posted. |
So, I took a look at it. This is not an error on the application's end. The URL Getting around TLS fingerprinting and blocking is possible but not easy. I will explore if this needs to be supported in a later version. For now, the application gets a 403 error and skips recording the page. You can see more details on what the application is doing internally by enabling the debug logs. |
Thank you for the answer to this question. That makes perfect sense. |
🐛 Describe the bug
My original code included:
That works fine. But then I then changed to a new URL to ingest:
The output I receive is still referencing the old, original WebLoader
source: https://www.forbes.com/profile/elon-musk
as indicated in the output of running a new prompt:When what should be happening is a new webLoader be instantiated, with a reference to the new URL
https://platform.openai.com/docs/guides/prompt-engineering
.The text was updated successfully, but these errors were encountered: