Skip to content

PlayerData/failed-pod-controller

Repository files navigation

Failed Pod Controller

Kubernetes controller which watches for pods which have certain failure states and removes them.

Why?

kubernetes/kubernetes#99986

Since the introduction of Graceful node shutdown, pods which were running a node which was shutdown are left in the cluster with a state of Terminated.

This leads to issues such as prometheus/prometheus#10257, and can lead to alerts firing unnecessarily.

Deployment

Use Kustomize, or deploy the docker image yourself:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: failed-pod-controller

resources:
  - github.com/playerdata/failed-pod-controller?ref=main

Configuration

Available Env Vars:

Name Description
CONTROLLER_DRY_RUN If truthy, won't actually delete any pods