Dynamically extending job liveness #702

ddorian · 2022-11-15T11:39:01Z

So you have Retry Stalled Jobs, where you set a RUNNING_JOBS_MAX_TIME and you retry jobs that are stalled https://procrastinate.readthedocs.io/en/stable/howto/retry_stalled_jobs.html.

Now, what happens if your jobs are dynamic in nature and can take very long?

Think, you're converting videos on a video-uploading website.
Someone uploads 1 minute video, another one uploads a 24hour video.

You need a way to "extend" the lifetime of the 24hour video job while also having the ability to retry it if it failed/stalled somehow.

This way you'd need a thread, that will extend a timestamp in the database every, say, 30 seconds.
And stalled jobs would only be considered those where time has passed since last extended time.

Makes sense? Or maybe there's another way?

ewjoachim · 2022-11-15T21:43:27Z

I think it could make sense to store something in the database, but I'm not exactly sure if procrastinate should be in charge of that, and what exactly it would entail. Do you think it's something you could be doing on your side ? Rather than callingget_stalled_jobs, use list_jobs and have your own logic determine whether tasks are stalled or not ?

I'm always a bit hesitant to add new mechanisms, especially when they need threads and such.

ddorian · 2022-11-16T10:36:55Z

I think it could make sense to store something in the database, but I'm not exactly sure if procrastinate should be in charge of that,

I think it's part of the job-queue.

and what exactly it would entail.

Just keeping a timestamp on the job and extending it and comparing it when doing list_jobs.

There are 2 ways, either keeping the job locked somehow (i think that's what celery/rabbitmq do) and that can be done by keeping transaction open (but will be heavy transaction). Or extending the time (what SQS does).

I'm always a bit hesitant to add new mechanisms, especially when they need threads and such.

With async it shouldn't be heavy. I've previously used gevent which also isn't heavy.
Or you can use 1 thread to extend all active jobs in the current process, which also shouldn't be heavy.

caire-bear · 2023-05-09T04:54:32Z

Would also like this feature for parsing a large file in a job.

In NSQ you would touch a message so it wouldn't timeout and get requeued by the message broker. It was up to the client to keep the heartbeat going to extend the time to process the message, though. If it didn't hear back within some amount of time it would requeue the message. I think we used some light async periodic callback to do it tornado.ioloop.PeriodicCallback. Doing something like this would require adding some concept of a timeout timestamp to each procrastinate job that would get extended with each touch and also defining a timeout for each queue.
https://github.com/nsqio/nsq/blob/1362af17d50b7129b47c0291e7f2e0b7eef2bb62/nsqd/channel.go#L333

A similar mechanism in procrastinate may be helpful so a queue with a long running job that's still making progress can continue working while ensuring other queues are more timely via retry_stalled_jobs.

ewjoachim · 2024-01-06T20:32:13Z

I believe this may be similar to what's discussed in #740 : using a heartbeat on jobs rather than a timeout to evaluate if a job is dead. Also, it's possible that a worker might be still sending heartbeats but the job would be in an infinite loop, and in that case, we'd still need a timeout. There's a mechanism for customizing retries. Maybe the should be a similar one for customizing timeouts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically extending job liveness #702

Dynamically extending job liveness #702

ddorian commented Nov 15, 2022

ewjoachim commented Nov 15, 2022 •

edited

ddorian commented Nov 16, 2022

caire-bear commented May 9, 2023 •

edited

ewjoachim commented Jan 6, 2024

Dynamically extending job liveness #702

Dynamically extending job liveness #702

Comments

ddorian commented Nov 15, 2022

ewjoachim commented Nov 15, 2022 • edited

ddorian commented Nov 16, 2022

caire-bear commented May 9, 2023 • edited

ewjoachim commented Jan 6, 2024

ewjoachim commented Nov 15, 2022 •

edited

caire-bear commented May 9, 2023 •

edited