feature/onboarding #3881

sywhb · 2024-05-01T03:31:52Z

No description provided.

vercel · 2024-05-01T03:31:55Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
omnivore-demo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 7, 2024 7:11am
omnivore-prod	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 7, 2024 7:11am

…over folder

jacksonh · 2024-05-07T09:32:39Z

packages/api/src/entity/library_item.ts

- // embedding?: number[]
+ // typeorm does not support vector type, so we store it as a string
+ @Column('text')
+ embedding?: string


Do you think we should use a separate table for these? I know we talked about it with some other attributes.

jacksonh · 2024-05-07T09:34:57Z

packages/api/src/entity/library_item.ts

@@ -204,4 +204,7 @@ export class LibraryItem {

 @Column('text')
 highlightAnnotations?: string[]
+
+ @Column('timestamptz')
+ sharedAt?: Date


Thinking out loud, but also wonder if this could be is_discover_item not as flexible for future use, but maybe more explicit. I'm not sure what's better, just a thought.

I think this and above are similar question: Should we have a join table for things like discover_item and item_embedding.

There are several considerations:

search performance for inner joins:
Probably not so bad for one-to-one relations like discover_item and item_embedding. I think we can benchmark the query.

frequently update column:
I agree with your thought and it has been a big concern for us. Worth to move these columns to a separate table.

optimization:
Instead of running cosine similarity query on the fly, we could pre-generate the topics or some other features and store them in another table or indexing.

naming:
I think if we want to be more explicit, we should better go with a separate table discover_item which links to the libraryItem table for normalization and reducing the size of the table.

Having considered all these concerns, I think we should

Test and benchmark the search query against join tables (probably only one-to-one relations) in cloned DB

If (1) result is not bad, we could create two additional separate tables: discover_item and item_embedding

Create another job to generate topic when the embedding is updated and stored in the table

@jacksonh

Hi @sywhb I also mentioned this on Discord, but a lot of what you've talked about in here is already done as part of my Discover update.

The optimisation part is already done as part of the import in discover, see:
https://github.com/omnivore-app/omnivore/blob/main/packages/db/migrations/0168.do.add_discover_feeds.sql

The discover feeds contain already the topics and embeddings. There's also a separate table for discover items, which have their embeddings stored, whether a user has saved a story - which is used to calculate the popularity score in discover too.

The calculations of the embeddings is done via https://github.com/omnivore-app/omnivore/blob/main/packages/discover/src/index.ts - This deliberately slows down the embedding process so that you never exceeds the RPS that the API supports.

Thank you @Podginator ! Taking a look now

jacksonh · 2024-05-07T09:36:15Z

packages/api/src/resolvers/article/index.ts


 // We allow the backend to use the ID instead of a slug to fetch the article
 // query against id if slug is a uuid
 slug.match(/^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$/i)
- ? qb.andWhere('libraryItem.id = :id', { id: slug })
- : qb.andWhere('libraryItem.slug = :slug', { slug })
+ ? qb.where('libraryItem.id = :id', { id: slug })


What does this change do?

What does this change do?

oh i see, we are removing the uid check above. Its a little scary to fully rely on RLS for that.

The motive is to use the slug index to improve query performance. I think RLS should be quite reliable but alternatively we could update the index to use both user_id and slug columns too

vercel bot deployed to Preview – omnivore-demo May 1, 2024 03:36 View deployment

vercel bot deployed to Preview – omnivore-prod May 1, 2024 03:36 View deployment

jacksonh approved these changes May 2, 2024

View reviewed changes

sywhb force-pushed the feature/onboarding branch from e7a071e to 71dcfc2 Compare May 2, 2024 08:10

vercel bot deployed to Preview – omnivore-prod May 2, 2024 08:18 View deployment

vercel bot deployed to Preview – omnivore-demo May 2, 2024 08:18 View deployment

vercel bot deployed to Preview – omnivore-prod May 2, 2024 08:33 View deployment

vercel bot deployed to Preview – omnivore-demo May 2, 2024 08:34 View deployment

vercel bot deployed to Preview – omnivore-prod May 2, 2024 08:38 View deployment

vercel bot deployed to Preview – omnivore-demo May 2, 2024 08:38 View deployment

sywhb marked this pull request as ready for review May 2, 2024 08:53

sywhb requested a review from satindar as a code owner May 2, 2024 08:53

vercel bot deployed to Preview – omnivore-demo May 3, 2024 08:55 View deployment

vercel bot deployed to Preview – omnivore-prod May 3, 2024 08:55 View deployment

sywhb force-pushed the feature/onboarding branch from 0d4e0c7 to 6928f18 Compare May 5, 2024 01:34

vercel bot deployed to Preview – omnivore-demo May 5, 2024 01:39 View deployment

vercel bot deployed to Preview – omnivore-prod May 5, 2024 01:39 View deployment

vercel bot deployed to Preview – omnivore-prod May 5, 2024 05:56 View deployment

vercel bot deployed to Preview – omnivore-demo May 5, 2024 05:56 View deployment

vercel bot deployed to Preview – omnivore-demo May 5, 2024 07:00 View deployment

vercel bot deployed to Preview – omnivore-prod May 5, 2024 07:00 View deployment

sywhb force-pushed the feature/onboarding branch from ffe938d to d84ff22 Compare May 6, 2024 06:04

vercel bot deployed to Preview – omnivore-demo May 6, 2024 06:10 View deployment

vercel bot deployed to Preview – omnivore-prod May 6, 2024 06:10 View deployment

vercel bot deployed to Preview – omnivore-prod May 6, 2024 09:19 View deployment

vercel bot deployed to Preview – omnivore-demo May 6, 2024 09:20 View deployment

vercel bot deployed to Preview – omnivore-demo May 6, 2024 09:40 View deployment

vercel bot deployed to Preview – omnivore-prod May 6, 2024 09:40 View deployment

vercel bot deployed to Preview – omnivore-demo May 6, 2024 11:28 View deployment

vercel bot deployed to Preview – omnivore-prod May 6, 2024 11:29 View deployment

vercel bot deployed to Preview – omnivore-prod May 7, 2024 03:35 View deployment

vercel bot deployed to Preview – omnivore-demo May 7, 2024 03:35 View deployment

sywhb added 14 commits May 7, 2024 15:03

fix sql error in existing discover code

fd931de

add shared_at column to the library_item entity

ae279af

if the user is a discover user, we want to share the item to the disc…

29b790c

…over folder

create policy to allow select shared library item

90b4c61

update discover api

446447b

fix permission issue

f219ddc

update get article api to show discover item

019c707

update migration version

b7e10cb

update save discover item api

522a620

find discover items by topic

d589434

add topic table

ec5325c

add topic entity

238d625

add update embedding job

1ff3870

increase cosine similarity threshold to 0.55

7867b99

sywhb force-pushed the feature/onboarding branch from 994e66b to 7867b99 Compare May 7, 2024 07:03

vercel bot deployed to Preview – omnivore-demo May 7, 2024 07:11 View deployment

vercel bot deployed to Preview – omnivore-prod May 7, 2024 07:11 View deployment

jacksonh reviewed May 7, 2024

View reviewed changes

sywhb requested a review from jacksonh May 7, 2024 10:47

sywhb closed this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/onboarding #3881

feature/onboarding #3881

sywhb commented May 1, 2024

vercel bot commented May 1, 2024 •

edited

jacksonh May 7, 2024

jacksonh May 7, 2024 •

edited

sywhb May 7, 2024 •

edited

Podginator May 8, 2024 •

edited

sywhb May 9, 2024

jacksonh May 7, 2024

jacksonh May 7, 2024

sywhb May 7, 2024

feature/onboarding #3881

feature/onboarding #3881

Conversation

sywhb commented May 1, 2024

vercel bot commented May 1, 2024 • edited

jacksonh May 7, 2024

Choose a reason for hiding this comment

jacksonh May 7, 2024 • edited

Choose a reason for hiding this comment

sywhb May 7, 2024 • edited

Choose a reason for hiding this comment

Podginator May 8, 2024 • edited

Choose a reason for hiding this comment

sywhb May 9, 2024

Choose a reason for hiding this comment

jacksonh May 7, 2024

Choose a reason for hiding this comment

jacksonh May 7, 2024

Choose a reason for hiding this comment

sywhb May 7, 2024

Choose a reason for hiding this comment

vercel bot commented May 1, 2024 •

edited

jacksonh May 7, 2024 •

edited

sywhb May 7, 2024 •

edited

Podginator May 8, 2024 •

edited