Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Using batches for update document with a new function in ChromaDB (la…
…ngchain-ai#6561) https://github.com/hwchase17/langchain/blob/2a4b32dee24c22159805f643b87eece107224951/langchain/vectorstores/chroma.py#L355-L375 Currently, the defined update_document function only takes a single document and its ID for updating. However, Chroma can update multiple documents by taking a list of IDs and documents for batch updates. If we update 'update_document' function both document_id and document can be `Union[str, List[str]]` but we need to do type check. Because embed_documents and update functions takes List for text and document_ids variables. I believe that, writing a new function is the best option. I update the Chroma vectorstore with refreshed information from my website every 20 minutes. Updating the update_document function to perform simultaneous updates for each changed piece of information would significantly reduce the update time in such use cases. For my case I update a total of 8810 chunks. Updating these 8810 individual chunks using the current function takes a total of 8.5 minutes. However, if we process the inputs in batches and update them collectively, all 8810 separate chunks can be updated in just 1 minute. This significantly reduces the time it takes for users of actively used chatbots to access up-to-date information. I can add an integration test and an example for the documentation for the new update_document_batch function. @hwchase17 [berkedilekoglu](https://twitter.com/berkedilekoglu)
- Loading branch information