Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore concept of VNA - Vertical Node Autoscaler #354

Open
eytan-avisror opened this issue Apr 1, 2022 · 3 comments
Open

Explore concept of VNA - Vertical Node Autoscaler #354

eytan-avisror opened this issue Apr 1, 2022 · 3 comments

Comments

@eytan-avisror
Copy link
Collaborator

eytan-avisror commented Apr 1, 2022

In some cases, it may be appropriate to scale nodes vertically, i.e. from m5.xlarge to m5.2xlarge.
For example, when we detect better binpacking may occur, or when the IG reaches the max and there are pending pods.

e.g.

We can try to abstract instance type completely, example:

apiVersion: instancemgr.keikoproj.io/v1alpha1
kind: InstanceGroup
metadata:
  name: my-instance-group
  namespace: instance-manager
spec:
  provisioner: eks
  strategy:
    type: rollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  eks:
    minSize: 3
    maxSize: 6
    configuration:

      # < instanceType not provided >

      instanceFamily: m5  # optional

      resources:
        requests:
          mem: 8Gi
          cpu: 2
        limits:
          mem: 64Gi
          cpu: 16
      ...

Initially spin up m5.xlarge (if instanceFamily is provided, otherwise we can decide the best match) which provides 2vcpu/8Gi mem, and we can scale up to m5.4xlarge which has 16/64 respectively.

Another option is to keep this new spec inside VerticalScalingPolicy so that the IG simply does not provide instanceType and VSP can be provided as follows:

apiVersion: instancemgr.keikoproj.io/v1alpha1
kind: VerticalScalingPolicy
metadata:
  name: default
  namespace: instance-manager
spec:

  instanceFamily: m5  # optional

  resources:
    requests:
      mem: 8Gi
      cpu: 2
    limits:
      mem: 64Gi
      cpu: 16

  scaleTargetRef:
      apiVersion: instancemgr.keikoproj.io/v1alpha1
      kind: InstanceGroup
      name: my-instance-group

We should also probably explore supporting something like HPA's behavior spec based on node capacity

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 100 // should be between 0 and 40
      periodSeconds: 15
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
    - type: Pods
      value: 4
      periodSeconds: 15
    selectPolicy: Max

@backjo any thoughts on this, would you find this useful?

@backjo
Copy link
Collaborator

backjo commented Apr 1, 2022

I could see it being useful - though we just use multiple IGs right now with scale from zero enabled and it solves it for us. CA does a decent job of scaling between them. It is a bit tedious though.

@eytan-avisror
Copy link
Collaborator Author

eytan-avisror commented Apr 1, 2022

@backjo interesting, so you keep multiple IG on min 0, and in case you need to scale up beyond max of ASG-1 - ASG-2..N. would scale up additional nodes for you? How does CA know which ASG to scale?
In this case would it make more sense to scale vertically with a single IG instead and keep the same range of nodes? e.g. min 3 / max 10

@backjo
Copy link
Collaborator

backjo commented Apr 12, 2022

More like - we have multiple IGs with different compute / memory requirements. CA is configured to least-waste

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants