Dynamic "Groups" / "Interests" #154

brianfeister · 2024-09-27T16:05:01Z

The idea here is that we explore using the /embed endpoint exposed by marqo to pass documents into a request and have a defined embedding output.

(Related: #146)

The discussion below is about "hard-coded" categories (meaning, we coerce all events into them), but it would be interesting if instead we passed in an array of documents (range queried by geo bounds) and then get "dynamic tags" where we ask the API to sort the documents into exactly 10 categories.

This would be non-deterministic, and the categories would adapt to the available content, producing an effect where you're not "spray and praying" by clicking "Live music" in the UI. If there is none, it will not show up as a category. If that category is present (optionally coerced into our defined categories) it will show in the UI

Again, the below conversation was about "after event insert" post tagging / processing of events.

From Robertson Taylor, Sales Engineer at Marqo (conversation here: https://meetnear.slack.com/archives/C07KQCLMQG7/p1726530661764979?thread_ts=1726527232.633069&cid=C07KQCLMQG7):

There's a hacky way of doing this that might make more sense than spinning up a second index to do the tagging.
Marqo exposes an endpoint /embed that lets you embed an array of documents. You could do the following.

Every minute run a search with a filter to find uncategorized items. Starting point is just a rand []float32 of the embedding dimension. Importantly expose_facets should be true.

The service should cache the embeddings of each category. E.g. /embed [ "dancing", "sports", "singing", ... ]

For each item returned by the search, find the closest embedding (compared with cosine similarity) in the embedded categories array

After every calculation is complete, call update_documents with the result batch.

Notes:

We choose a random starting point to reduce odds of working on the same item if two batches wind up running at the same time. There's no issue with the updates, just inefficiency.

This could probably run in a Lambda function.

It seems like Go has some reasonable packages for working with vector

The text was updated successfully, but these errors were encountered:

brianfeister mentioned this issue Sep 27, 2024

Decide on categories and images strategy #146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic "Groups" / "Interests" #154

Dynamic "Groups" / "Interests" #154

brianfeister commented Sep 27, 2024 •

edited

Loading

Dynamic "Groups" / "Interests" #154

Dynamic "Groups" / "Interests" #154

Comments

brianfeister commented Sep 27, 2024 • edited Loading

brianfeister commented Sep 27, 2024 •

edited

Loading