Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: competition 2 BaseANN changes #111

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sourcesync
Copy link
Collaborator

@sourcesync sourcesync commented May 11, 2023

This is a draft PR to get ideas/feedback around possible changes to BaseANN for the second competition.

Assumptions:

  • each object in a dataset could have a dense vector or a sparse vector (or both)
  • each object could also have a set of scalars associated with it
  • need to add new delete() and insert() virtual methods for the competition streaming task
  • range_query() won't change for the second competition

Possible Approaches:

  • augment query() with additional parameters called meta_filter and sparse_vector which default to empty set and None respectively. This assumes the query() X parameter is still a dense vector
  • create a new virtual method called hybrid_query() which exposes a three parameters dense_vector, sparse_vector and meta_filter.
  • add delete() call with an I parameter which contains a list of dataset indices to delete
  • add insert() with an X parameter which contains a batch of vectors to add. Each element of X could be a hybrid of dense vector, sparse vector, and scalars.

( @harsha-simhadri had asked me to kick-start this, but if anything has already been done here I'm happy close this PR )

@sourcesync sourcesync changed the title competition 2 BaseANN changes draft: competition 2 BaseANN changes May 11, 2023
@harsha-simhadri
Copy link
Owner

We need to add insert and delete too

@harsha-simhadri
Copy link
Owner

We need to add insert and delete too

To be more specific, lets use batch_insert() and batch_delete().

@sourcesync
Copy link
Collaborator Author

@mdouze @ingberam I went ahead and prototyped the new/updated candidate virtual methods in BaseANN. See the Files Changed tab for the proposals.

@maumueller
Copy link
Collaborator

@sourcesync Thanks so much for going ahead with this! As I understood, there is no hybrid query, but only a dense query, add, remove (possible with metadata) and a sparse query. Given these different settings, I think it makes more sense to have BaseDenseANN and BaseSparseANN subclasses from BaseANN. Also, there are probably going to be different runners for the different scenarios since I imagine participants targeting single scenarios (and maybe they want to use their own runner and not provide a wrapper.)

My plan was to merge the baselines next week and then refactor the code around them. It would be great if we could join forces @sourcesync

@sourcesync
Copy link
Collaborator Author

@sourcesync Thanks so much for going ahead with this! As I understood, there is no hybrid query, but only a dense query, add, remove (possible with metadata) and a sparse query. Given these different settings, I think it makes more sense to have BaseDenseANN and BaseSparseANN subclasses from BaseANN. Also, there are probably going to be different runners for the different scenarios since I imagine participants targeting single scenarios (and maybe they want to use their own runner and not provide a wrapper.)

My plan was to merge the baselines next week and then refactor the code around them. It would be great if we could join forces @sourcesync

Awesome @Martin Aumüller. Yeah, I think approaching this with different base classes is a great choice esp. if "hybrid" is not a part of this competition.

( Note that if you already have a branch / PR going, I'm quite happy to close this draft PR...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants