Support performing an action when a job is aborted #838

wallart1 · 2024-12-19T06:14:49Z

Summary

Support performing an action when a job is aborted.

Steps to reproduce the problem

When a job is aborted, there are cases when additional actions need to be taken in addition to just terminating tasks. I ran into this with a job that runs a ZFS scrub with the -w option, which waits for the asynchronous scrub to finish before returning. When I abort the job, the scrub command terminates, but the scrub itself is still running. To actually stop the scrub, one must issue an additional command just for this purpose.

There may also be jobs that, when voluntarily aborted, additional cleanup/recovery actions need to be taken.

Your Setup

Just a single server.

Operating system and version?

Linux Mint 22

Node.js version?

v20.17.0

Cronicle software version?

Version 0.9.59

Are you using a multi-server setup, or just a single server?

Single

Are you using the filesystem as back-end storage, or S3/Couchbase?

Filesystem

Can you reproduce the crash consistently?

Log Excerpts

jhuckaby · 2024-12-19T20:01:46Z

Aborting a job will send a SIGTERM to the outermost process. Can you use the Shell Plugin and provide a shell wrapper that traps SIGTERM and acts on it? Example:

#!/bin/bash

# Define a function to handle the SIGTERM signal
cleanup() {
    echo "Caught SIGTERM signal. Running cleanup..."
    # Add your custom cleanup commands here
    # For example, stop services, clean temporary files, etc.
    echo "Cleanup done."
    exit 0
}

# Trap SIGTERM signal
trap cleanup SIGTERM

# Run your job here
/path/to/my/script.sh

wallart1 · 2024-12-19T23:27:49Z

That works! Thanks for the advice.

EDIT: Oops. See below.

wallart1 · 2024-12-19T23:59:31Z

Sorry. I spoke too soon. It doesn't actually cancel the ZFS scrub. Here is the job log after I aborted the job:

# Job ID: jm4vyxval0b
# Event Title: Scrub fivebays
# Hostname: foghorn
# Date/Time: 2024/12/19 18:44:38 (GMT-5)

+ trap cleanup SIGTERM
+ zpool scrub -w fivebays
Caught SIGTERM, killing child: 511024
Child did not exit, killing harder: 511024

# Job failed at 2024/12/19 18:45:12 (GMT-5).
# Error: Job Aborted: Manually aborted by user: admin
# End of log.

Here is the script:

#!/bin/bash
set -x

# Function to handle the SIGTERM signal (when job is aborted)
cleanup() {
    echo "Caught SIGTERM signal. Cancelling scrub of fivebays."
    zpool scrub -s fivebays
    exit $?
}

trap cleanup SIGTERM

zpool scrub -w fivebays

wallart1 · 2024-12-20T00:50:01Z

I notice that the SIGTERM message in the log is not the same as the one in the script. Is something preempting it?

jhuckaby · 2024-12-20T05:14:05Z

Ah, I think I see the problem:

Caught SIGTERM, killing child: 511024
Child did not exit, killing harder: 511024

So, Cronicle gives the child 10 seconds to shut down after sending the SIGTERM. If it does not die, it sends a SIGKILL (which cannot be trapped).

You can increase the timeout in the configuration here: https://github.com/jhuckaby/Cronicle/blob/master/docs/Configuration.md#child_kill_timeout

wallart1 closed this as completed Dec 19, 2024

wallart1 reopened this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support performing an action when a job is aborted #838

Support performing an action when a job is aborted #838

wallart1 commented Dec 19, 2024 •

edited

Loading

jhuckaby commented Dec 19, 2024

wallart1 commented Dec 19, 2024 •

edited

Loading

wallart1 commented Dec 19, 2024

wallart1 commented Dec 20, 2024

jhuckaby commented Dec 20, 2024

Support performing an action when a job is aborted #838

Support performing an action when a job is aborted #838

Comments

wallart1 commented Dec 19, 2024 • edited Loading

Summary

Steps to reproduce the problem

Your Setup

Operating system and version?

Node.js version?

Cronicle software version?

Are you using a multi-server setup, or just a single server?

Are you using the filesystem as back-end storage, or S3/Couchbase?

Can you reproduce the crash consistently?

Log Excerpts

jhuckaby commented Dec 19, 2024

wallart1 commented Dec 19, 2024 • edited Loading

wallart1 commented Dec 19, 2024

wallart1 commented Dec 20, 2024

jhuckaby commented Dec 20, 2024

wallart1 commented Dec 19, 2024 •

edited

Loading

wallart1 commented Dec 19, 2024 •

edited

Loading