Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are dimensions? #98

Open
RomanSteinberg opened this issue Nov 22, 2024 · 4 comments
Open

What are dimensions? #98

RomanSteinberg opened this issue Nov 22, 2024 · 4 comments
Assignees

Comments

@RomanSteinberg
Copy link

RomanSteinberg commented Nov 22, 2024

Describe the bug
I have standard document table in Supabase:

create table documents (
  id bigserial primary key,
  content text, -- corresponds to Document.pageContent
  metadata jsonb, -- corresponds to Document.metadata
  embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed
);

My vecs code:

client = vecs.create_client(connection_uri)
collection = client.get_or_create_collection(name='documents', dimension=1536)

This code raises an MismatchedDimension in Collection._create_if_not_exists function. It says that 3 is not equal to 1536.
This function executes query

select
    relname as table_name,
    atttypmod as embedding_dim
from
    pg_class pc
    join pg_attribute pa
        on pc.oid = pa.attrelid
where
    pc.relnamespace = 'vecs'::regnamespace
    and pc.relkind = 'r'
    and pa.attname = 'vec'
    and not pc.relname ^@ '_'
    and pc.relname = 'documents'

and it returns 3. But if you look into my table there is no number 3 at all. If one wishes to get size of vector should use

SELECT vector_dims(embedding) AS embedding_dim
FROM documents
LIMIT 1;

May be I do not understand what is dimension for vecs collection. So, please, explain.

To Reproduce
Steps to reproduce the behavior:

  1. Install vecs 0.4.4
  2. Create table in Supabase mentioned above.
  3. Run client.get_or_create_collection (code above)

Expected behavior
I expect correct obtaining of the collection.

Screenshots
If applicable, add screenshots to help explain your problem.

Versions:

  • PostgreSQL: 15.6 [supabase.com]
  • vecs version: 0.4.4
@olirice
Copy link
Collaborator

olirice commented Dec 2, 2024

you are correct that the dimension should be 1536 in this case but vecs creates the table for you, you should not be writing any SQL directly.

I'd suggest trying a different collection name that doesn't already exist and seeing if that resolves the issue

@RomanSteinberg
Copy link
Author

RomanSteinberg commented Dec 3, 2024

you are correct that the dimension should be 1536 in this case but vecs creates the table for you, you should not be writing any SQL directly.

I'd suggest trying a different collection name that doesn't already exist and seeing if that resolves the issue

@olirice look into my code sample please. I choose different collection documents but library ignores it.

I investigated little bit more. Library always query schema vecs. Reference.

@olirice
Copy link
Collaborator

olirice commented Dec 3, 2024

Library always query schema vecs

correct, the vecs library creates and manages all tables in the vecs schema of the postgres instance. That is intended behavior

look into my code sample please

the issue is that you manually ran the code

create table documents (
  id bigserial primary key,
  content text, -- corresponds to Document.pageContent
  metadata jsonb, -- corresponds to Document.metadata
  embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed
);

you can skip that step entirely. vecs will create the table that it needs automatically

If this

select
    relname as table_name,
    atttypmod as embedding_dim
from
    pg_class pc
    join pg_attribute pa
        on pc.oid = pa.attrelid
where
    pc.relnamespace = 'vecs'::regnamespace
    and pc.relkind = 'r'
    and pa.attname = 'vec'
    and not pc.relname ^@ '_'
    and pc.relname = 'documents'

is returning 3 then it means your table was inadvertently created with

embedding vector(3)

I suspect what has happened is that you

  • manually created the documents table
  • ran collection = client.get_or_create_collection(name='documents', dimension=3) once
    • which created an entirely new table in the vecs schema named vecs.documents
  • updated the python call to collection = client.get_or_create_collection(name='documents', dimension=1536)
    • and that call now fails because there is an existing table named vecs.documents with vector(3) as the dimenison
      but its hard to know for sure without access to the instance to debug

To resolve that, go into your supabase project (note, this will delete any existing vecs data) and run

drop schema vecs cascade;

Once you've done that, you can start over with

client = vecs.create_client(connection_uri)
collection = client.get_or_create_collection(name='documents', dimension=1536)

@RomanSteinberg
Copy link
Author

RomanSteinberg commented Dec 3, 2024

@olirice I can see that it is expected behavior. Ok, but I have filled DB at this moment. And it was created according to Supabase manuals with the documents table as I presented in public schema. Also, I expect that there would be relations with other tables in public schema in the future. So, it seems that vecs library is not my choice. Am I right?

PS: Actually, I used supabase library to make all queries I need couple of days ago. So, I have the workaround already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants