Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: "Collection is not created" message despite being returned by list_collections method #2207

Closed
hoosengold opened this issue May 15, 2024 · 6 comments · Fixed by #2208
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@hoosengold
Copy link

What happened?

I am trying to get an existing ChromaDB collection with the get_or_create_collection method of a PersistentClient object but I get 'Collection "collection_name" is not created.'.

Here's a snippet of the source code:

client = chromadb.PersistentClient(path="/path/to/collection/directory")
print(f"Available collections: {client.list_collections()}")    # <- this returns the collection that I want to get
embedding_function = (embedding_functions.SentenceTransformerEmbeddingFunction("all-mpnet-base-v2","cuda"))
client.get_or_create_collection("test_collection", embedding_function)

I tracked down the source of the message and it is */chromadb/api/segment.py, line 189.

I also tried to reproduce the message by creating a copy of the project and changing the version of the chromadb Python package inside a pipenv environment. I could not get the message despite everything being the same (package version, collection directory path, collection name and embedding function) when I used version 0.4.24. I've asserted the values of all parameters and they are always correct. The message appears when I upgrade to the latest version 0.5.0.

It is possible to perform all actions with the collection such as creating one, inserting and deleting data and querying chunks from the collection despite the above message.

Expected behaviour: the message should not appear if the collection exists or it's getting created and there are no errors.

Versions

Chroma 0.5.0
Python 3.10.12
Ubuntu 22.04 (WSL 2)

Relevant log output

No response

@hoosengold hoosengold added the bug Something isn't working label May 15, 2024
@tazarov
Copy link
Contributor

tazarov commented May 15, 2024

Hey @hoosengold, thanks for pointing this out. This is just an info-level log message used for tracking the behavior of the get_or_create_collection() function semantics. It just lets you know that the call did not create a new collection but returned an existing one. I can see how this exact phrasing can be misleading and interpreted as an error.

Let us improve on the messaging.

@hoosengold
Copy link
Author

Hi @tazarov, thank you for the fast reply!

It just lets you know that the call did not create a new collection but returned an existing one.

It seems like the message is always displayed, not just when getting the collection but also when creating it. That means that the created bool variable is always set to False by the _sysdb.create_collection() method for some reason. I hope that this does not indicate a more complex problem.

However, thanks again for the quick response and fixing the info message!

@tazarov
Copy link
Contributor

tazarov commented May 15, 2024

Thanks for this insight, @hoosengold. I will have a look in sysdb and fix that if necessary.

@tazarov
Copy link
Contributor

tazarov commented May 15, 2024

@hoosengold,

I just tested with:

import chromadb

client = chromadb.HttpClient()

client.get_or_create_collection("Test")

Results:

INFO:     [15-05-2024 13:57:01] Application startup complete.
INFO:     [15-05-2024 13:57:01] Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO:     [15-05-2024 13:57:09] ::1:50449 - "GET /api/v1/tenants/default_tenant HTTP/1.1" 200
INFO:     [15-05-2024 13:57:09] ::1:50449 - "GET /api/v1/databases/default_database?tenant=default_tenant HTTP/1.1" 200
INFO:     [15-05-2024 13:57:09] ::1:50450 - "POST /api/v1/collections?tenant=default_tenant&database=default_database HTTP/1.1" 200
INFO:     [15-05-2024 13:57:17] ::1:50453 - "GET /api/v1/tenants/default_tenant HTTP/1.1" 200
INFO:     [15-05-2024 13:57:17] ::1:50453 - "GET /api/v1/databases/default_database?tenant=default_tenant HTTP/1.1" 200
DEBUG:    [15-05-2024 13:57:17] Collection Test already exists, returning existing collection.
INFO:     [15-05-2024 13:57:17] ::1:50454 - "POST /api/v1/collections?tenant=default_tenant&database=default_database HTTP/1.1" 200

As expected, the message was returned only when trying to get_or_create_collection() on an existing collection. The initial creation (INFO: [15-05-2024 13:57:09] ::1:50450 - "POST /api/v1/collections?tenant=default_tenant&database=default_database HTTP/1.1" 200) of the collection did not trigger the message.

@hoosengold
Copy link
Author

@tazarov

Thank you for the provided logs. I reproduced your test with a PersistentClient (with a basic FastAPI implementation) and it works as expected and described by you. Here's a snippet of the code:

import chromadb
client = chromadb.PersistentClient(path=os.path.join(os.getcwd(), "chroma"))
client.get_or_create_collection("Test")

Logs:

INFO:     Application startup complete.
INFO: Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
INFO:     127.0.0.1:53838 - "POST /api/v1/test HTTP/1.1" 200 OK
INFO: Collection Test is not created.
INFO:     127.0.0.1:33750 - "POST /api/v1/test HTTP/1.1" 200 OK

When I test it with the code from the initial issue, I get the info message no matter if the collection exists or not. I guess the problem is someplace else. I will update this comment (or write a new one) if I find where the problem is.

Thank you for the fast replies and for your time!

@sanketkedia
Copy link
Contributor

Closing this for bug hygiene. Feel free to create another issue if you find another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
3 participants