K8sToolbox is a versatile toolkit designed for managing Kubernetes clusters, providing essential debugging tools and utilities for cluster administrators and developers.
K8sToolbox is an all-in-one, versatile toolkit engineered to streamline the management and troubleshooting of Kubernetes clusters. It serves as a one-stop solution for all Kubernetes management and troubleshooting needs, providing everything required to maintain and optimize your cluster environments. It provides a comprehensive suite of powerful debugging, diagnostic, and operational tools designed specifically for cluster administrators, DevOps engineers, and developers. With K8sToolbox, users gain the ability to efficiently manage workloads, diagnose issues, and maintain optimal performance of Kubernetes environments.
Equipped with a rich collection of utilities that include advanced network diagnostics, automated health checks, resource monitoring, and recovery mechanisms, K8sToolbox empowers teams to proactively identify and resolve cluster-related challenges. Whether you need to aggregate logs, test pod connectivity, manage resources, or troubleshoot network policies, K8sToolbox offers a unified solution that significantly reduces complexity while enhancing productivity.
K8sToolbox integrates essential third-party utilities like kubectl
, stern
, k9s
, and mc
(MinIO Client), providing a seamless, command-driven experience for interacting with Kubernetes clusters. This toolkit not only simplifies debugging processes but also provides the scalability and robustness necessary to maintain large-scale, multi-node environments effectively. By combining a vast array of capabilities into a single, easy-to-use image, K8sToolbox ensures that Kubernetes management is more accessible, efficient, and reliable than ever before.
Tools included:
curl, iproute2, iputils, netcat-openbsd, tcpdump, bind-tools, traceroute, iperf3, jq, strace, htop, iftop, net-tools, rsync, openssl, gpg, vim, nano, busybox-extras, mariadb-client, postgresql-client, redis, mongodb-tools, helm, socat, ncdu, bash, ca-certificates, conntrack-tools, ethtool, iptables, less, mtr, openssh-client, psmisc, tcptraceroute, ngrep, yq, kubectl, stern, k9s, mc, nmap, screen, tmux
K8sToolbox is perfect for:
- Cluster Troubleshooting: Quickly diagnose issues in your cluster, such as resource contention, network issues, or failed pods.
- Maintenance: Clean up stale or unused resources like completed jobs and old replicasets to keep your cluster healthy.
- Automation: Automate tasks like scaling deployments, resource usage checks, and more.
- Debugging Network Policies: Validate network connectivity and ensure your network policies are properly configured.
- Log Aggregation: Collect and analyze logs from multiple namespaces and pods to understand the state of your cluster and applications.
With K8sToolbox, you can:
- Execute health checks, manage stuck resources, aggregate logs, and perform network diagnostics.
- Run custom scripts directly from your local machine or inside a Kubernetes pod using shell exec. "For detailed usage instructions on the scripts, please refer to Using K8sToolbox Scripts."
K8sToolbox/
│
├── docker/
│ └── Dockerfile # Docker image definition for building K8sToolbox
│
├── manifests/
│ ├── debug-daemon.yaml # DaemonSet manifest for deploying K8sToolbox on all nodes
│ └── debug-pod.yaml # Pod manifest for running a standalone K8sToolbox instance
│
├── scripts/ # Collection of helpful Kubernetes management scripts
│ ├── aggregate_logs.sh
│ ├── auto_recover.sh
│ ├── auto_scaling.sh
│ ├── backup_restore.sh
│ ├── clean_stale_resources.sh
│ ├── connectivity_test.sh
│ ├── delete_stuck_crds.sh
│ ├── delete_stuck_namespace.sh
│ ├── healthcheck.sh
│ ├── network_diag.sh
│ ├── resource_usage.sh
│ ├── restart_failed_pods.sh
│ ├── snapshot_audit.sh
│ └── test_network_policy.sh
│
├── .gitignore
├── CONTRIBUTING.md # Guidelines for contributing to K8sToolbox
├── LICENSE # License details (Apache License 2.0)
├── go.mod # Go module definition
├── main.go # Main Golang utility file for K8sToolbox
└── README.md # Documentation (you're reading this!)
- Docker installed to build and run the K8sToolbox Docker image.
- Kubernetes cluster with
kubectl
configured to interact with the cluster. - Permissions: Ensure you have sufficient permissions to run commands like
kubectl exec
andkubectl apply
.
To build the Docker image for K8sToolbox, run the following command in the root directory of the project:
docker build -t k8stoolbox:latest -f docker/Dockerfile .
This will create a Docker image named k8stoolbox
that you can use locally or push to a container registry.
You can deploy K8sToolbox as either a standalone Pod or as a DaemonSet to cover all nodes.
To deploy a standalone K8sToolbox pod, use the following command:
kubectl apply -f https://raw.githubusercontent.com/narmidm/K8sToolbox/refs/heads/master/manifests/debug-pod.yaml
This creates a pod named k8stoolbox-debug
in the default
namespace, which can be used for one-off debugging and troubleshooting tasks.
To deploy K8sToolbox on all nodes, use the DaemonSet manifest:
kubectl apply -f https://raw.githubusercontent.com/narmidm/K8sToolbox/refs/heads/master/manifests/debug-daemon.yaml
This creates a DaemonSet that runs K8sToolbox on all nodes, making it accessible from anywhere in the cluster.
There are two primary ways to use K8sToolbox:
- Local Execution: Running scripts directly from the local system.
- Kubernetes Shell Execution: Executing commands inside a running K8sToolbox pod using
kubectl exec
.
You can run the scripts in the /scripts
directory locally if you have kubectl configured and connected to your Kubernetes cluster.
"For detailed usage instructions on the scripts, please refer to Using K8sToolbox Scripts."
Examples:
- Backup and Restore Resources:
./scripts/backup_restore.sh backup default ./scripts/backup_restore.sh restore default
- Clean Stale Resources:
./scripts/clean_stale_resources.sh default
- Test Network Policies:
./scripts/test_network_policy.sh default <source_pod> <target_pod>
- Aggregate Logs:
./scripts/aggregate_logs.sh default kube-system
You can also execute the scripts inside a running K8sToolbox pod by using kubectl exec. This is useful when you need to troubleshoot issues within the cluster itself.
First, find the name of the K8sToolbox pod:
kubectl get pods -n default -l app=k8stoolbox
Then use kubectl exec
to run commands:
- Execute a Health Check:
kubectl exec -it <k8stoolbox-pod-name> -- /usr/local/bin/healthcheck default
- Run Resource Cleanup:
kubectl exec -it <k8stoolbox-pod-name> -- /usr/local/bin/clean_stale_resources default
- Ping Between Pods to Test Network Policy:
kubectl exec -it <k8stoolbox-pod-name> -- /usr/local/bin/test_network_policy default <source_pod> <target_pod>
The /scripts
directory contains several useful scripts for Kubernetes management:
- aggregate_logs.sh: Aggregates logs from all pods in specified namespaces.
- auto_recover.sh: Automatically recovers failed pods and sends alerts.
- auto_scaling.sh: Automatically scales deployments based on resource usage.
- backup_restore.sh: Backs up and restores Kubernetes resources in a specified namespace.
- clean_stale_resources.sh: Cleans up completed jobs, old replicasets, and orphaned persistent volumes.
- connectivity_test.sh: Tests network connectivity between pods or services.
- delete_stuck_crds.sh: Deletes CRDs that are stuck by removing finalizers.
- delete_stuck_namespace.sh: Deletes namespaces that are stuck due to finalizers.
- healthcheck.sh: Performs health checks on pods and nodes in a namespace.
- network_diag.sh: Provides advanced network diagnostics, including capturing traffic.
- resource_usage.sh: Monitors CPU and memory usage for nodes and pods.
- restart_failed_pods.sh: Restarts all failed pods in a given namespace.
- snapshot_audit.sh: Takes a snapshot of the cluster state for auditing purposes.
- test_network_policy.sh: Tests network connectivity between pods to validate network policies.
For convenience, all scripts are symlinked to /usr/local/bin
in the Docker image, allowing you to call them without specifying the full path. For example:
auto_recover
backup_restore backup default
clean_stale_resources default
This section provides detailed information on the various utility scripts available in the K8sToolbox
repository, located under the /scripts
directory. Each script has been crafted to assist with common Kubernetes cluster management and troubleshooting tasks. Below, you will find how to use each script, including commands to run them directly or through the symlinks set up during installation.
To learn how to use these scripts, follow the instructions provided for each script below. You can execute these commands from your local machine or within a Kubernetes pod, depending on your setup.
If you have cloned the repository and have kubectl
configured to interact with your cluster, you can run the scripts directly by using:
./scripts/<script_name>.sh [arguments]
Alternatively, you can use the symlinks created for each script, allowing you to run them without specifying the full path:
<script_name> [arguments]
Ensure that your shell environment includes /usr/local/bin
in the $PATH
so that the symlinks are accessible.
Below is a list of all available scripts, with detailed descriptions and examples of how to use them:
-
aggregate_logs.sh
Aggregates logs from all pods within a specified namespace. This script is useful when you need a combined view of application logs.aggregate_logs <namespace>
Example:
aggregate_logs default
This command will aggregate logs from all pods in the
default
namespace and print them to the console. -
auto_recover.sh
Automatically recovers failed pods and restarts them as needed. This script can be run to automate pod recovery.auto_recover <namespace>
Example:
auto_recover kube-system
This command will automatically recover any failed pods in the
kube-system
namespace by restarting them. -
auto_scaling.sh
Adjusts deployment scaling based on resource usage. This script can help automate horizontal scaling.auto_scaling <namespace> <deployment_name> <desired_replicas>
Example:
auto_scaling default my-app 5
This command will scale the deployment named
my-app
in thedefault
namespace to5
replicas. -
backup_restore.sh
Backs up and restores Kubernetes resources in a namespace. Useful for disaster recovery scenarios.backup_restore backup <namespace> backup_restore restore <namespace>
Example:
backup_restore backup default
This command will back up all resources in the
default
namespace. Userestore
instead ofbackup
to restore the resources. -
clean_stale_resources.sh
Cleans up old Kubernetes resources such as completed jobs and replicasets.clean_stale_resources <namespace>
Example:
clean_stale_resources default
This command will clean up stale resources (e.g., completed jobs, old replicasets) in the
default
namespace. -
connectivity_test.sh
Tests network connectivity between two pods. Useful for validating network policies.connectivity_test <namespace> <source_pod> <target_pod>
Example:
connectivity_test default pod-a pod-b
This command will test network connectivity from
pod-a
topod-b
in thedefault
namespace. -
delete_stuck_crds.sh
Deletes Custom Resource Definitions (CRDs) that are stuck due to finalizers.delete_stuck_crds <crd_name>
Example:
delete_stuck_crds my-crd
This command will forcefully delete the CRD named
my-crd
that is stuck due to finalizers. -
delete_stuck_namespace.sh
Deletes namespaces that are stuck in terminating status.delete_stuck_namespace <namespace>
Example:
delete_stuck_namespace default
This command will delete the
default
namespace if it is stuck in a terminating state. -
healthcheck.sh
Performs health checks on all pods within a namespace.healthcheck <namespace>
Example:
healthcheck kube-system
This command will perform health checks on all pods in the
kube-system
namespace and report any issues found. -
network_diag.sh
Provides advanced network diagnostics, including traffic capture between pods.network_diag <namespace> <source_pod> <target_pod>
Example:
network_diag default pod-a pod-b
This command will perform network diagnostics between
pod-a
andpod-b
in thedefault
namespace, including capturing traffic if needed. -
resource_usage.sh
Monitors CPU and memory usage for nodes and pods within a namespace.resource_usage <namespace>
Example:
resource_usage default
This command will monitor and display the CPU and memory usage for all nodes and pods in the
default
namespace. -
restart_failed_pods.sh
Restarts all failed pods in a given namespace.restart_failed_pods <namespace>
Example:
restart_failed_pods kube-system
This command will restart all failed pods in the
kube-system
namespace. -
snapshot_audit.sh
Takes a snapshot of the cluster state for auditing purposes.snapshot_audit <namespace>
Example:
snapshot_audit default
This command will take a snapshot of the cluster state for the
default
namespace, which can be used for auditing purposes. -
test_network_policy.sh
Tests network policies by attempting connections between source and target pods.test_network_policy <namespace> <source_pod> <target_pod>
Example:
test_network_policy default pod-a pod-b
This command will test the network policies by attempting to connect from
pod-a
topod-b
in thedefault
namespace.
We welcome contributions! Please read the CONTRIBUTING.md file for guidelines on how to contribute to K8sToolbox.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
K8sToolbox was inspired by various Kubernetes utility tools, including the Swiss Army Knife for DevOps. Special thanks to all contributors who helped improve this toolbox.
- Add more advanced diagnostics tools.
- Integration with Prometheus for enhanced monitoring capabilities.
K8sToolbox was inspired by the swiss-army-knife repository, which serves as a useful multi-purpose tool for DevOps. Our goal is to build upon that foundation and create a specialized, Kubernetes-focused toolkit that helps users effectively troubleshoot and manage their clusters.
Let us know if you have feature requests or suggestions to make K8sToolbox even better!