-
-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(webhook): Implement webhook functionality for bookmark events #852
Conversation
hanguofeng
commented
Jan 9, 2025
- Added WebhookWorker to handle webhook requests.
- Integrated webhook triggering in crawlerWorker after video processing.
- Updated main worker initialization to include WebhookWorker.
- Enhanced configuration to support webhook URLs, token, and timeout.
- Documented webhook configuration options in the documentation.
- Introduced zWebhookRequestSchema for validating webhook requests.
- Added WebhookWorker to handle webhook requests. - Integrated webhook triggering in crawlerWorker after video processing. - Updated main worker initialization to include WebhookWorker. - Enhanced configuration to support webhook URLs, token, and timeout. - Documented webhook configuration options in the documentation. - Introduced zWebhookRequestSchema for validating webhook requests.
Hey, thanks a lot for the PR. I'll be able to review this later today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good to me! Left a couple of small comments, and then we can merge it!
Great job navigating the codebase without much assistance :)
apps/workers/webhookWorker.ts
Outdated
{ | ||
concurrency: 1, | ||
pollIntervalMs: 1000, | ||
timeoutSecs: serverConfig.inference.jobTimeoutSec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing inference timeout here is probably not intentional, let's maybe using the webhook.timeout / 1000
server settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to :
timeoutSecs: serverConfig.webhook.timeout / 1000 * (serverConfig.webhook.retryTimes + 1) * serverConfig.webhook.urls.length + 1,
This is considering retry times, urls, and timeout and add 1 second for other stuff
apps/workers/webhookWorker.ts
Outdated
|
||
for (const url of webhookUrls) { | ||
try { | ||
const response = await Promise.race([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of Promise.race, we can use AbortSignal.timeout
here so that the fetch itself get cancelled as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goods suggestions , I improved this
|
||
if (!response.ok) { | ||
logger.error(`Webhook call to ${url} failed with status: ${response.status}`); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do if the webhook response is a non success status code? In that case, do we want to retry or not? I think we probably should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a retry logic and also a config WEBHOOK_RETRY_TIMES
docs/docs/03-configuration.md
Outdated
``` | ||
Authorization: Bearer <WEBHOOK_TOKEN> | ||
``` | ||
- The webhook will be triggered with the job id (used for idempotent), bookmark id, the user id, the url and the operation in JSON format in the body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The webhook will be triggered with the job id (used for idempotent), bookmark id, the user id, the url and the operation in JSON format in the body. | |
- The webhook will be triggered with the job id (used for idempotence), bookmark id, the user id, the url and the operation in JSON format in the body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good , changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also some other doc improvements
packages/shared/config.ts
Outdated
WEBHOOK_URLS: z.preprocess( | ||
(val) => typeof val === "string" ? val.split(",") : val, | ||
z.array(z.string().url()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, not sure if the use of preprocess
here is right. It seems more correct to do do:
z.string().transform(val => val.split(",")).pipe(z.array(z.string().url()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I changed it and with default value
packages/shared/queues.ts
Outdated
//Webhook worker | ||
export const zWebhookRequestSchema = z.object({ | ||
bookmarkId: z.string(), | ||
url: z.string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the url
should be included here. We have bookmarks of other types, and you're already fetching the bookmark in the worker anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , I removed it and also removed from triggerWebhookWorker
packages/shared/queues.ts
Outdated
queueDB, | ||
{ | ||
defaultJobArgs: { | ||
numRetries: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we probably should retry on failure? Maybe 3 retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to 3
apps/workers/crawlerWorker.ts
Outdated
// Trigger a webhook | ||
await triggerWebhookWorker(bookmarkId, url, "create"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handles the case for URLs though. I think we should also trigger the webhook for non URL bookmarks which don't go through the crawling workflows. You'll need to trigger those from packages/trpc/routers/bookmarks.ts
(at the end of the createBookmark handler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that the question of which events to support needs more debate and consideration, I've constrained it to "crawled." This means we only support the event that signals crawl completion for now. This preserves scalability and prevents current issues from becoming unnecessarily complex like @mkarliner saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been thinking about this the last few days.
May I suggest a event of 'edited' or 'changed'.
Here is the use case:
I intend to use these events to post a stream of my bookmarks to social media, in particular the Fediverse/Maston, through the ActivityPub protocol. I don't actually want an unfiltered stream to be posted, I want to curate it at some level.
So, with an 'changed' event, I can manually change a bookmark in some way that will indicated to my bot that this is a bookmark I want to share and it will pick up the change and publish it.
It's worth noting that there are now Fediverse - Threads - Bluesky bridges, so this solution would get Hoarder bookmarks to a potentially large audience
Alternatively, if the event specified which list the bookmark was added to, that would also work.
apps/workers/webhookWorker.ts
Outdated
"Authorization": `Bearer ${webhookToken}`, | ||
}, | ||
body: JSON.stringify({ | ||
jobId, bookmarkId, userId: bookmark.userId, url: bookmark.link.url, operation: job.data.operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's include bookmark type here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added
A more high level question. Do we want to only trigger the webhook after the crawling? Or should we trigger it once on create, and once after it gets crawled? We can also add a configuration setting for the events people want to subscribe to. |
To further open the can of worms, what are the events in the life cycle of a bookmark, and should tags or lists also have life cycles? I'm not suggesting these should be done now, but at least asking the question now leaves further expansion options open. |
- Changed webhook operation type from "create" to "crawled" in crawlerWorker and documentation. - Enhanced webhook retry logic in WebhookWorker to support multiple attempts. - Updated Docker configuration to include new webhook environment variables. - Improved validation for webhook configuration in shared config. - Adjusted zWebhookRequestSchema to reflect the new operation type. - Updated documentation to clarify webhook configuration options and usage.
Thank you for your professional review. I have revised the code and updated the branch in the PR accordingly. Regarding your question about "what are the events in the life cycle of a bookmark, and should tags or lists also have life cycles?", here are my thoughts: First, let me clarify the background of this PR. It stems from my own use case. I use Hoarder as a read-it-later tool and Obsidian as my note-taking tool. Since Obsidian isn't always convenient as a web clipper, I use Hoarder as an alternative. However, I also want to sync the content of my saved URLs to Obsidian. That's why I designed this webhook mechanism to receive signals and sync to Obsidian using specific methods. |
Sorry that this taking long to review, I'll be able to have a look on it this weekend. @mkarliner your suggestion about more events makes a lot of sense now that we have the infra to do it. I think we can land this PR with the current events and followups with more PRs to add more events. Shouldn't be too hard. |
Excellent. Looking forward to it. |
@hanguofeng Thanks a lot @hanguofeng for the high quality PR! I did some minor modifications before landing.
|
Ended up pushing the changes to make this configurable by user, and added the |
This is great, I'll integrate this into my RSS aggregator! Thanks. |