-
Notifications
You must be signed in to change notification settings - Fork 54
Tips for Kubernetes
When you are developing and would like to remove all running or failed workflow
pods, such as reana-run-batch-...
, it is not sufficient to remove only the
pods. These runtime pods are created from Kubernetes jobs, so the proper way
to delete all running workflows is to delete all the jobs:
$ kubectl delete jobs --all
In multi-cluster deployments, it sometimes happen that one would like to update
an image on all the nodes. For example, a user might have re-pushed
johndoe/myanalysis:latest
so that some nodes might have "old" image and some nodes might have "new" image. (The proper way to handle these is to use semantic versioning of images, see What's wrong with the Docker :latest tag?.)
When this happens, the following one-liner can delete all the images from all the nodes, so that next time the workflow is run, the nodes would re-pull latest image:
$ for node in $(kubectl get nodes -l reana.io/system=runtimejobs | awk '{print $1;}'); do ssh -q -i ~/.ssh/reana.pem -o StrictHostKeyChecking=no core@$node 'sudo docker rmi johndoe/myanalysis:latest'; done
This also helps when we would like to prefetch a certain image to all the nodes in advance, e.g. during benchmarking:
$ for node in $(kubectl get nodes -l reana.io/system=runtimejobs | awk '{print $1;}'); do ssh -q -i ~/.ssh/reana.pem -o StrictHostKeyChecking=no core@$node 'sudo docker pull johndoe/myanalysis:42'; done
Note that sometimes Kubernetes clusters do not use Docker as a container
technology, but Podman. In these cases crictl
can be used:
$ for node in $(kubectl get nodes -l reana.io/system=runtimejobs | awk '{print $1;}'); do ssh -q -i ~/.ssh/reana.pem -o StrictHostKeyChecking=no core@$node 'sudo crictl rmi johndoe/myanalysis:latest'; done
If REANA is not starting user jobs due to not having enough memory, even though the cluster seems little used, it could be due to having a lot of buffered/cached memory in cluster nodes.
For example:
$ kubectl top nodes
reana-aaabbbcccddd-node-82 214m 2% 7312Mi 52%
But:
[core@reana-aaabbbcccddd-node-82 ~]$ free -h
total used free shared buff/cache available
Mem: 14Gi 2.1Gi 1.4Gi 5.0Mi 10Gi 11Gi
Swap: 0B 0B 0B
When this happens, one can force the clean-up of the OS cache by doing sudo sysctl vm.drop_caches=3
on all nodes, so that cache is flushed and Kubernetes sees the all available memory again:
for node in $(kubectl get nodes --show-labels -o wide | grep -v Disabled | grep runtime | awk '{print $1;}'); do echo "==> $node" && ssh -q -i ~/.ssh/reana.pem -o StrictHostKeyChecking=no core@$node "sudo sysctl vm.drop_caches=3"; done
REANA reproducible analysis platform
blog.reana.io | docs.reana.io | forum.reana.io | www.reana.io |
@gitter | @mattermost | @twitter
Introduction
Getting started
- Setting up your system
- Cloning sources
- Using production-like development mode
- Using live-code-reload and debug mode
Issue lifecycle
Understanding code base
Technology tips and tricks
- Tips for Docker
- Tips for Git
- Tips for GitLab
- Tips for Keycloak
- Tips for Kind
- Tips for Kubernetes
- Tips for OpenAPI
- Tips for PostgreSQL
- Tips for Python
- Tips for RabbitMQ
- Tips for SQLAlchemy