Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic "Groups" / "Interests" #154

Open
brianfeister opened this issue Sep 27, 2024 · 0 comments
Open

Dynamic "Groups" / "Interests" #154

brianfeister opened this issue Sep 27, 2024 · 0 comments

Comments

@brianfeister
Copy link
Member

brianfeister commented Sep 27, 2024

The idea here is that we explore using the /embed endpoint exposed by marqo to pass documents into a request and have a defined embedding output.

(Related: #146)

The discussion below is about "hard-coded" categories (meaning, we coerce all events into them), but it would be interesting if instead we passed in an array of documents (range queried by geo bounds) and then get "dynamic tags" where we ask the API to sort the documents into exactly 10 categories.

This would be non-deterministic, and the categories would adapt to the available content, producing an effect where you're not "spray and praying" by clicking "Live music" in the UI. If there is none, it will not show up as a category. If that category is present (optionally coerced into our defined categories) it will show in the UI

Again, the below conversation was about "after event insert" post tagging / processing of events.

From Robertson Taylor, Sales Engineer at Marqo (conversation here: https://meetnear.slack.com/archives/C07KQCLMQG7/p1726530661764979?thread_ts=1726527232.633069&cid=C07KQCLMQG7):

There's a hacky way of doing this that might make more sense than spinning up a second index to do the tagging.
Marqo exposes an endpoint /embed that lets you embed an array of documents. You could do the following.

  1. Every minute run a search with a filter to find uncategorized items. Starting point is just a rand []float32 of the embedding dimension. Importantly expose_facets should be true.
  2. The service should cache the embeddings of each category. E.g. /embed [ "dancing", "sports", "singing", ... ]
  3. For each item returned by the search, find the closest embedding (compared with cosine similarity) in the embedded categories array
  4. After every calculation is complete, call update_documents with the result batch.

Notes:

  1. We choose a random starting point to reduce odds of working on the same item if two batches wind up running at the same time. There's no issue with the updates, just inefficiency.
  2. This could probably run in a Lambda function.
  3. It seems like Go has some reasonable packages for working with vector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant