feat(webhook): Implement webhook functionality for bookmark events #852

hanguofeng · 2025-01-09T16:17:57Z

Added WebhookWorker to handle webhook requests.
Integrated webhook triggering in crawlerWorker after video processing.
Updated main worker initialization to include WebhookWorker.
Enhanced configuration to support webhook URLs, token, and timeout.
Documented webhook configuration options in the documentation.
Introduced zWebhookRequestSchema for validating webhook requests.

- Added WebhookWorker to handle webhook requests. - Integrated webhook triggering in crawlerWorker after video processing. - Updated main worker initialization to include WebhookWorker. - Enhanced configuration to support webhook URLs, token, and timeout. - Documented webhook configuration options in the documentation. - Introduced zWebhookRequestSchema for validating webhook requests.

MohamedBassem · 2025-01-11T11:18:25Z

Hey, thanks a lot for the PR. I'll be able to review this later today!

MohamedBassem

Overall, looks good to me! Left a couple of small comments, and then we can merge it!

Great job navigating the codebase without much assistance :)

MohamedBassem · 2025-01-12T16:27:50Z

apps/workers/webhookWorker.ts

+      {
+        concurrency: 1,
+        pollIntervalMs: 1000,
+        timeoutSecs: serverConfig.inference.jobTimeoutSec,


Reusing inference timeout here is probably not intentional, let's maybe using the webhook.timeout / 1000 server settings?

I changed it to :

timeoutSecs: serverConfig.webhook.timeout / 1000 * (serverConfig.webhook.retryTimes + 1) * serverConfig.webhook.urls.length + 1,

This is considering retry times, urls, and timeout and add 1 second for other stuff

MohamedBassem · 2025-01-12T16:33:18Z

apps/workers/webhookWorker.ts

+
+  for (const url of webhookUrls) {
+    try {
+      const response = await Promise.race([


I think instead of Promise.race, we can use AbortSignal.timeout here so that the fetch itself get cancelled as well.

Goods suggestions , I improved this

MohamedBassem · 2025-01-12T16:34:20Z

apps/workers/webhookWorker.ts

+
+      if (!response.ok) {
+        logger.error(`Webhook call to ${url} failed with status: ${response.status}`);
+      }


What should we do if the webhook response is a non success status code? In that case, do we want to retry or not? I think we probably should?

I added a retry logic and also a config WEBHOOK_RETRY_TIMES

MohamedBassem · 2025-01-12T16:35:29Z

docs/docs/03-configuration.md

+  ```
+  Authorization: Bearer <WEBHOOK_TOKEN>
+  ```
+- The webhook will be triggered with the job id (used for idempotent), bookmark id, the user id, the url and the operation in JSON format in the body.


Suggested change

- The webhook will be triggered with the job id (used for idempotent), bookmark id, the user id, the url and the operation in JSON format in the body.

- The webhook will be triggered with the job id (used for idempotence), bookmark id, the user id, the url and the operation in JSON format in the body.

Good , changed

And also some other doc improvements

MohamedBassem · 2025-01-12T16:39:09Z

packages/shared/config.ts

+  WEBHOOK_URLS: z.preprocess(
+    (val) => typeof val === "string" ? val.split(",") : val,
+    z.array(z.string().url())


Hmmm, not sure if the use of preprocess here is right. It seems more correct to do do:

z.string().transform(val => val.split(",")).pipe(z.array(z.string().url()))

Yes, I changed it and with default value

MohamedBassem · 2025-01-12T16:40:43Z

packages/shared/queues.ts

+//Webhook worker
+export const zWebhookRequestSchema = z.object({
+  bookmarkId: z.string(),
+  url: z.string(),


I don't think the url should be included here. We have bookmarks of other types, and you're already fetching the bookmark in the worker anyways.

Yes , I removed it and also removed from triggerWebhookWorker

MohamedBassem · 2025-01-12T16:42:24Z

packages/shared/queues.ts

+  queueDB,
+  {
+    defaultJobArgs: {
+      numRetries: 1,


I feel like we probably should retry on failure? Maybe 3 retries?

I changed it to 3

MohamedBassem · 2025-01-12T16:44:41Z

apps/workers/crawlerWorker.ts

+    // Trigger a webhook
+    await triggerWebhookWorker(bookmarkId, url, "create");
+


This handles the case for URLs though. I think we should also trigger the webhook for non URL bookmarks which don't go through the crawling workflows. You'll need to trigger those from packages/trpc/routers/bookmarks.ts (at the end of the createBookmark handler).

Considering that the question of which events to support needs more debate and consideration, I've constrained it to "crawled." This means we only support the event that signals crawl completion for now. This preserves scalability and prevents current issues from becoming unnecessarily complex like @mkarliner saying.

Been thinking about this the last few days.
May I suggest a event of 'edited' or 'changed'.
Here is the use case:
I intend to use these events to post a stream of my bookmarks to social media, in particular the Fediverse/Maston, through the ActivityPub protocol. I don't actually want an unfiltered stream to be posted, I want to curate it at some level.
So, with an 'changed' event, I can manually change a bookmark in some way that will indicated to my bot that this is a bookmark I want to share and it will pick up the change and publish it.

It's worth noting that there are now Fediverse - Threads - Bluesky bridges, so this solution would get Hoarder bookmarks to a potentially large audience

Alternatively, if the event specified which list the bookmark was added to, that would also work.

MohamedBassem · 2025-01-12T16:45:14Z

apps/workers/webhookWorker.ts

+            "Authorization": `Bearer ${webhookToken}`,
+          },
+          body: JSON.stringify({
+            jobId, bookmarkId, userId: bookmark.userId, url: bookmark.link.url, operation: job.data.operation


Let's include bookmark type here as well.

MohamedBassem · 2025-01-12T16:47:56Z

A more high level question. Do we want to only trigger the webhook after the crawling? Or should we trigger it once on create, and once after it gets crawled? We can also add a configuration setting for the events people want to subscribe to.

mkarliner · 2025-01-12T18:05:32Z

A more high level question. Do we want to only trigger the webhook after the crawling? Or should we trigger it once on create, and once after it gets crawled? We can also add a configuration setting for the events people want to subscribe to.

To further open the can of worms, what are the events in the life cycle of a bookmark, and should tags or lists also have life cycles? I'm not suggesting these should be done now, but at least asking the question now leaves further expansion options open.

- Changed webhook operation type from "create" to "crawled" in crawlerWorker and documentation. - Enhanced webhook retry logic in WebhookWorker to support multiple attempts. - Updated Docker configuration to include new webhook environment variables. - Improved validation for webhook configuration in shared config. - Adjusted zWebhookRequestSchema to reflect the new operation type. - Updated documentation to clarify webhook configuration options and usage.

hanguofeng · 2025-01-13T07:14:34Z

Thank you for your professional review. I have revised the code and updated the branch in the PR accordingly.

Regarding your question about "what are the events in the life cycle of a bookmark, and should tags or lists also have life cycles?", here are my thoughts:

First, let me clarify the background of this PR. It stems from my own use case. I use Hoarder as a read-it-later tool and Obsidian as my note-taking tool. Since Obsidian isn't always convenient as a web clipper, I use Hoarder as an alternative. However, I also want to sync the content of my saved URLs to Obsidian. That's why I designed this webhook mechanism to receive signals and sync to Obsidian using specific methods.
I'm looking forward to the day when Hoarder implements features like "offline reading" and "vector search". These would largely replace the functionalities I currently use Obsidian for in my workflow.
Ultimately, processing collected notes, webpages, etc., with large language models for intelligent search and content creation is a common use case. If Hoarder's roadmap includes this, then the webhook capability doesn't need to be too robust. However, if the roadmap doesn't include this, using this mechanism to open up a community ecosystem would require further development.

MohamedBassem · 2025-01-17T11:01:27Z

Sorry that this taking long to review, I'll be able to have a look on it this weekend. @mkarliner your suggestion about more events makes a lot of sense now that we have the infra to do it. I think we can land this PR with the current events and followups with more PRs to add more events. Shouldn't be too hard.

mkarliner · 2025-01-17T11:25:25Z

Sorry that this taking long to review, I'll be able to have a look on it this weekend. @mkarliner your suggestion about more events makes a lot of sense now that we have the infra to do it. I think we can land this PR with the current events and followups with more PRs to add more events. Shouldn't be too hard.

Excellent. Looking forward to it.

MohamedBassem · 2025-01-19T12:38:58Z

@hanguofeng Thanks a lot @hanguofeng for the high quality PR! I did some minor modifications before landing.

If I get time before the next release (specially with the ongoing trademark drama), I might make this functionality configurable per user such that every user can configure their own webhooks from the UI.
Regarding obsidian, someone actually just submitted an obsidian sync plugin to the obsidian plugin store (Export/sync to Obsidian #622).
Vector search and offline reading are both planned. In fact I already have a non-polished PR for embeddings and using a vector database (WIP: Bookmark embeddings #834).
@mkarliner I'll followup with another PR to add created/edited events!

MohamedBassem · 2025-01-19T19:07:24Z

Ended up pushing the changes to make this configurable by user, and added the created/edited lifecycle hooks.

erik-nilcoast · 2025-01-21T02:48:55Z

This is great, I'll integrate this into my RSS aggregator! Thanks.

MohamedBassem requested changes Jan 12, 2025

View reviewed changes

hanguofeng added 2 commits January 13, 2025 06:50

Merge branch 'hoarder-app:main' into feat-webhook

c2b1518

hanguofeng requested a review from MohamedBassem January 13, 2025 08:58

MohamedBassem added 2 commits January 19, 2025 12:09

Merge branch 'main' into feat-webhook

e68821a

minor modifications

05970fb

MohamedBassem merged commit b9cce5d into hoarder-app:main Jan 19, 2025
4 checks passed

MohamedBassem mentioned this pull request Jan 19, 2025

[feature request] trigger webhooks #384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(webhook): Implement webhook functionality for bookmark events #852

feat(webhook): Implement webhook functionality for bookmark events #852

hanguofeng commented Jan 9, 2025

MohamedBassem commented Jan 11, 2025

MohamedBassem left a comment

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

hanguofeng Jan 13, 2025

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

mkarliner Jan 17, 2025 •

edited

Loading

MohamedBassem Jan 12, 2025

hanguofeng Jan 13, 2025

MohamedBassem commented Jan 12, 2025

mkarliner commented Jan 12, 2025

hanguofeng commented Jan 13, 2025

MohamedBassem commented Jan 17, 2025

mkarliner commented Jan 17, 2025

MohamedBassem commented Jan 19, 2025

MohamedBassem commented Jan 19, 2025

erik-nilcoast commented Jan 21, 2025

	- The webhook will be triggered with the job id (used for idempotent), bookmark id, the user id, the url and the operation in JSON format in the body.
	- The webhook will be triggered with the job id (used for idempotence), bookmark id, the user id, the url and the operation in JSON format in the body.

		// Trigger a webhook
		await triggerWebhookWorker(bookmarkId, url, "create");

feat(webhook): Implement webhook functionality for bookmark events #852

feat(webhook): Implement webhook functionality for bookmark events #852

Conversation

hanguofeng commented Jan 9, 2025

MohamedBassem commented Jan 11, 2025

MohamedBassem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkarliner Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MohamedBassem commented Jan 12, 2025

mkarliner commented Jan 12, 2025

hanguofeng commented Jan 13, 2025

MohamedBassem commented Jan 17, 2025

mkarliner commented Jan 17, 2025

MohamedBassem commented Jan 19, 2025

MohamedBassem commented Jan 19, 2025

erik-nilcoast commented Jan 21, 2025

mkarliner Jan 17, 2025 •

edited

Loading