You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment Escalator supports only one type of mode for the selection of which nodes to terminate - oldest first. This mode just prioritises the oldest nodes in the Kubernetes API by the creation timestamp. This works well and is simple, but some more modes may be needed to support service based workloads.
This issue proposes some new node selection methods for termination, which are:
Selection of nodes based on how easily drainable the node is. This would be determined with the drain simulation package provided by the cluster-autoscaler tool.
Selection of nodes based on how utilised they are. This would be determined by prioritising nodes with less requested resources and would terminate nodes that are close to idling or have low usage.
These node selection methods could potentially be used at the same time, with a weighted sum model used to determine the "ideal" or highest scoring nodes to terminate first. The weighted sum model would apply a score to each node when evaluating it against a set of criteria. The criteria could be how old the node is, how easily it is able to be drained and finally how utilised the node is. The nodes with the highest scores overall would be prioritised for termination.
Using the utilisation based termination method by itself may lead to a situation where some nodes aren't ever terminated because they are heavily utilised. Using a weighted sum model and pairing it with the current "oldest first" method, both utilisation and how old the node is would be considered before deciding which nodes to terminate.
At the moment Escalator supports only one type of mode for the selection of which nodes to terminate - oldest first. This mode just prioritises the oldest nodes in the Kubernetes API by the creation timestamp. This works well and is simple, but some more modes may be needed to support service based workloads.
This issue proposes some new node selection methods for termination, which are:
These node selection methods could potentially be used at the same time, with a weighted sum model used to determine the "ideal" or highest scoring nodes to terminate first. The weighted sum model would apply a score to each node when evaluating it against a set of criteria. The criteria could be how old the node is, how easily it is able to be drained and finally how utilised the node is. The nodes with the highest scores overall would be prioritised for termination.
Using the utilisation based termination method by itself may lead to a situation where some nodes aren't ever terminated because they are heavily utilised. Using a weighted sum model and pairing it with the current "oldest first" method, both utilisation and how old the node is would be considered before deciding which nodes to terminate.
Cluster autoscaler drain simlator: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/simulator
Weighted sum model: https://en.wikipedia.org/wiki/Weighted_sum_model
/cc @dadux @mwhittington21
The text was updated successfully, but these errors were encountered: