Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Nvidia GPU Feature Discovery #1219

Open
p53 opened this issue May 1, 2024 · 9 comments
Open

Support Nvidia GPU Feature Discovery #1219

p53 opened this issue May 1, 2024 · 9 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@p53
Copy link

p53 commented May 1, 2024

Description

Original Title: Ignore node selector labels for provisioning

What problem are you trying to solve?

We have nvidia operator which installs nvidia runtime etc.. on karpenter nodes after they are provisioned, operator runs feature discovery and applies appropriate nvidia labels, we need to place pods on these karpenter nodes depending on these nvidia labels. Problem is that when i place nvidia labels in nodeSelector on pod, which are not in NodePool, because they are placed on nodes during node runtime by nvidia operator, karpenter will fail to provision nodes. Solution might be e.g. placing some annotations on pod e.g. karpenter.sh/ignore-label=somelabel so that karpenter ignores this label during provisioning

How important is this feature to you?

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@p53 p53 added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 1, 2024
@jonathan-innis
Copy link
Member

operator runs feature discovery and applies appropriate nvidia labels

What kind of feature discovery are you talking about here? Is it stuff related to the properties of the instance type that we are launching?

@Bryce-Soghigian
Copy link
Member

https://github.com/NVIDIA/gpu-feature-discovery?tab=readme-ov-file#deploy-nvidia-gpu-feature-discovery-gfd

gfd adds labels after the nodes have already been created.

$ kubectl get nodes -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Node
  metadata:
    ...

    labels:
      nvidia.com/cuda.driver.major: "455"
      nvidia.com/cuda.driver.minor: "06"
      nvidia.com/cuda.driver.rev: ""
      nvidia.com/cuda.runtime.major: "11"
      nvidia.com/cuda.runtime.minor: "1"
      nvidia.com/gpu.compute.major: "8"
      nvidia.com/gpu.compute.minor: "0"
      nvidia.com/gfd.timestamp: "1594644571"
      nvidia.com/gpu.count: "1"
      nvidia.com/gpu.family: ampere
      nvidia.com/gpu.machine: NVIDIA DGX-2H
      nvidia.com/gpu.memory: "39538"
      nvidia.com/gpu.product: A100-SXM4-40GB
      ...
...

@Bryce-Soghigian
Copy link
Member

Bryce-Soghigian commented May 3, 2024

basically you are requesting a workload requiring a node with those labels that we create a node with those labels, but the nodepool is not aware of these labels and we wont be aware of them. They aren't added until gfd goes and adds them. They are added after gpu nodes are provisioned?

How can karpenter know these traits? Seems relevant to per instance type overrides. If you know particular instance types will have particular traits then we can override a configmap to say these instance types have these values for the overrides.

Do these values differ from node to node? Seems cuda runtime is dependent on the gpu drivers installed on the node? We can't just cache them directly.

@p53
Copy link
Author

p53 commented May 3, 2024

basically you are requesting a workload requiring a node with those labels that we create a node with those labels, but the nodepool is not aware of these labels and we wont be aware of them. They aren't added until gfd goes and adds them. They are added after gpu nodes are provisioned? = yup that's right

How can karpenter know these traits? Seems relevant to per instance type overrides. If you know particular instance types will have particular traits then we can override a configmap to say these instance types have these values for the overrides. - i don't know how karpenter precisely works internally, it is probably possible to know these labels, at least part of them ahead of time and configure them statically, best would be if we would not need to define them in config statically

Do these values differ from node to node? Seems cuda runtime is dependent on the gpu drivers installed on the node? We can't just cache them directly. - we have e.g. all AWS g5 intances in one nodepool so for sure they will be different for each instance type, depending on gpu type of instance type, having each instance type in separate nodepool would be quite impractical

@p53
Copy link
Author

p53 commented May 3, 2024

DRA -> #1231 probably solve thing = "knowing before" as third-party drivers would present noderesourceslices when running on cluster altough not sure about its flexibility in terms we are still assuming that something is there before and it is constrained only on resources

@p53
Copy link
Author

p53 commented May 4, 2024

Also e.g. node feature discovery adds labels to nodes e.g CPU capabilities

@jonathan-innis
Copy link
Member

jonathan-innis commented May 14, 2024

best would be if we would not need to define them in config statically

I think the ideal state here is defining what the different configurations can be for the GPU feature discovery operator and then see if we can surface first-class support for these in Karpenter directly.

Like you mentioned, having to statically configure all of these values is going to be a huge pain, ideally Karpenter can auto-discover them by matching its logic up with what Nvidia tells us should be on these instance types.


I'm wondering if it makes sense to retitle this issue to be more specific to the use-case. Something like: "Support Nvidia GPU Feature Discovery". @p53 What do you think?

@jonathan-innis
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 14, 2024
@p53 p53 changed the title Ignore node selector labels for provisioning Support Nvidia GPU Feature Discovery May 14, 2024
@p53
Copy link
Author

p53 commented May 14, 2024

@jonathan-innis renamed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants