Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Proposal to Update vecs Python Client to Include Latest pgvector Functionalities #93

Open
Muhtasham opened this issue Oct 3, 2024 · 5 comments

Comments

@Muhtasham
Copy link

Summary

This RFC proposes adding support for the latest pgvector features into the vecs Python client. These include new vector types (halfvec, sparsevec), enhanced indexing capabilities, and additional vector functions (binary_quantize, hamming_distance, etc.).

Rationale

Recent advancements in pgvector—such as new vector types, improved indexing, and new functions—are currently missing from the vecs client. Integrating these features will ensure feature parity, enabling efficient storage, diverse similarity metrics, and extended vector operations, which will support a broader range of use cases.

Design

Proposed Additions

  1. Vector Types:

    • halfvec: Half precision vectors for reduced storage and faster operations.
    • sparsevec: Sparse vectors that store only non-zero values to optimize memory usage.
  2. Indexing Enhancements:

    • bit Type Indexing: Add support for indexing vectors stored as bit type.
    • L1 Distance with HNSW: Add support for using L1 distance with HNSW indexing for similarity searches.
  3. New Functions:

    • binary_quantize: Converts a vector into a binary form based on a threshold.
    • hamming_distance: Calculates Hamming distance for binary vectors.
    • jaccard_distance: Computes the Jaccard distance between vectors.
    • l2_normalize: Normalizes vectors to unit length.
    • subvector: Extracts a subvector from the main vector.

Examples

For instance:
Creating a halfvec vector:

from vecs import halfvec
vec = halfvec([1.0, 2.0, 3.0])
@Muhtasham Muhtasham changed the title Proposal to Update vecs Python Client to Include Latest pgvector Functionalities RFC: Proposal to Update vecs Python Client to Include Latest pgvector Functionalities Oct 3, 2024
@olirice
Copy link
Collaborator

olirice commented Oct 9, 2024

Opened a PR to support l1 distance

The refactor for halfvec support is more significant but we're interested in supporting that too

at this point I don't think we'll add support for sparsevec or bit. If the use cases for those vector types take off we'll revisit that decision

@Muhtasham
Copy link
Author

Muhtasham commented Oct 9, 2024 via email

@olirice
Copy link
Collaborator

olirice commented Oct 9, 2024

MaxSim

could you provide a reference? I don't see any references to MaxSim in the pgvector docs

@Muhtasham
Copy link
Author

@olirice here is reference in PGVector

having support like here qdrant or like vespa would be nice, happy to help with implementation if you guide me

@olirice
Copy link
Collaborator

olirice commented Oct 21, 2024

multi-vector queries would be a good stand-alone feature request if you'd like to open a new issue for it

this is the first I've seen of it so would be happy to leave it open for a few weeks and see what feedback looks like

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants