You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2020. It is now read-only.
Currently update-operator reboots nodes as soon as updates are available. #82 tracks adding support for a user-configured maintenance window. On top of that, even inside a maintenance window there could be situations where reboots should be temporarily paused (e.g. when some critical/unplanned outage is happening).
This can be currently done by setting a reboot-paused annotation on specific nodes, however this is a manual operation and doesn't scale well cluster-wide.
It would be nice to let CLUO know about any existing AlertManager in the cluster and check for specific active alerts before proceeding. @brancz suggested that we could:
take a ConfigMap with critical alerts that should cluster-wide pause reboots (and inotify-watch to hot-reload it)
reach the AM on its in-cluster public read-only endpoint and check for non-silenced critical alerts before setting reboot-ok
For clarity, this should be completely orthogonal to maintenance window configuration.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Currently
update-operator
reboots nodes as soon as updates are available. #82 tracks adding support for a user-configured maintenance window. On top of that, even inside a maintenance window there could be situations where reboots should be temporarily paused (e.g. when some critical/unplanned outage is happening).This can be currently done by setting a
reboot-paused
annotation on specific nodes, however this is a manual operation and doesn't scale well cluster-wide.It would be nice to let CLUO know about any existing AlertManager in the cluster and check for specific active alerts before proceeding. @brancz suggested that we could:
reboot-ok
For clarity, this should be completely orthogonal to maintenance window configuration.
The text was updated successfully, but these errors were encountered: