Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] When upserting through Document Upsert/Refresh API it always creates a new document no matter what, and metadata is ignored #3717

Open
elijenisolsen opened this issue Dec 16, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@elijenisolsen
Copy link

elijenisolsen commented Dec 16, 2024

The jist of it
The problem is, that when we try to upsert a txt file, with a given and correct docId it always creates a new document, and never overwrites the old document. Using the refresh link just throws a generic 500 error.

To Reproduce
Steps to reproduce the behaviour in Postman:

  1. First create a document loader with a pinecone connection, and set it up.
  2. Find your document store Id, and your document Id (Google it if you don't know how)
  3. use these to form your upsert link, which will be something like:
    https:///api/v1/document-store/upsert/#your-document-store-id#
    Add this as as Request in Postman, and remember to make it a POST.
  4. If required, add an Authoriztion header with a Bearer token to your request.
  5. Add the following formdata:
    files: binary (type needs to be file)
    metadata: {"source":"#your-key#"} (type is text)
    docId: #your-document-id# (type is text)
  6. See error: The response should be that a new file was uploaded, and the provided document id has been overwritten, and the metadata should be seen in the page content, but nevertheless it is not. And everytime it will create a new document, which is not the advertised behaviour, and everytime metadata gets ignored.

I understand there is such a thing as a JSON only request, but in this case we are upserting a file. I have tried making an JSON only POST, and having the content of the file in the content attribute, and upserting plain text instead of a file, but the behaviour is the same.

Example for JSON only POST that gets the exact the same response and behaviour from the upsert api:
{
"metadata": {
"source": "#your-key#"
},
"docId": "#your-document-id#",
"content": "#your-JSON-content#"
}

This behaves the exact same way.

Expected behavior
I expect the document that is being upserted, to overwrite the original document and not create a new one every time. And I expect that the metadata to not be ignored.

Setup

  • Instal Postman if you don't have it already
  • Flowise Version latest
  • OS: Windows 11 (Doesn't matter)
  • Browser: no browser needed, just use Postman
@HenryHengZJ
Copy link
Contributor

That's the current design, it will always use the existing config and create a new document. We can add in an option in the body like { overrideExisting: true } in future

@HenryHengZJ HenryHengZJ added the enhancement New feature or request label Dec 18, 2024
@briansoegaard
Copy link

@HenryHengZJ I recognize @elijenisolsen's issue here and have the same concern.

Please clarify: Using a Document Store, how do I upsert a document via the API and specify it's metadata in the API call? Using the good old API, I can do this by adding body_data = { "metadata": '{"source": "something"}'} as a parameter in my requests.post() call but I can't figure it out using a Docukment Loader in a Document Store.
Thanks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants