GPU machine type selection #3796

cosmicBboy · 2023-06-22T18:27:54Z

cosmicBboy
Jun 22, 2023
Maintainer

Use Case:

As an ML engineer, I want to specify exactly what kind of gpu type I want (e.g. A10G, A100, V100, T4) so that I can target a machine that suits my model training / inference workload. E.g. certain data-types like bfloat16 can only support certain GPU types.

Example flytekit api:

@task(requests=Resources(gpu_type="A100")
def train_model(...): ...

Flyte should be able to provision the correct instance in the underlying cloud (e.g. AWS, GCP) to fulfill the request.

fg91 · 2023-06-22T19:20:00Z

fg91
Jun 22, 2023
Collaborator

This use case is the exact reason why we were very eager for the new pod template feature.

With @task(pod_tempalte=...) one can already control the GPU type. However, to do so one needs to know about K8s node labels, node selectors, affinities... To hide this complexity away from our platform users, we wrote a wrapper around @task, let's call it @internal_task, which has an additional gpu_type argument. As you suggested, the user might pass "A100" or "V100" and in the python code of @internal_task we translate the gpu type string into a node selector that we set in the pod template.

On GKE for instance, nodes have these labels depending on the GPU type. (One question to be answered in an RFC would be whether all cloud providers handle selection of GPU types via node labels or whether other cloud providers have different resource names).

As a platform engineer, I not only want to give my platform users the ability to select GPU types, I also want to be able to prevent accidental usage of an expensive A100 node for instance by a task that requests any gpu (doesn't specify the GPU type).

This is why we set taints on our GPU node pools with the GPU type (and actually also GPU count). This is not done by default by GKE. The @internal_task then not only sets a gpu type node selector in the pod template but also adds a toleration for this custom gpu type taint.

If selection of GPU types becomes a feature in flyte itself, as a platform engineer, I probably wouldn't want to have the mapping from gpu type to taints, tolerations, ... in python.

Instead, to give our platform users the ability to switch between e.g. T4, V100, A100 e.g. on GKE, I could imagine the following configuration on the platform side (helm values):

gpu_types:
  - name: A100  # This can be any string
     nodeselector:
        cloud.google.com/gke-accelerator: nvidia-a100-80gb
     toleration:
        # Toleration for some custom taint which in the case of GKE does not exist by default but which I as a platform engineer would want to create. This is optional.
    # resource_name: On GKE this field wouldn't have to be set since all GPUs share the resource name which is already configured in the helm values

In case other cloud providers have different resource names for different GPU types (instead of node labels), platform engineers would configure:

gpu_type:
    - name:
      resource_name: ...

TL;DR I would put the responsibility of coming up with a mapping from arbitrary gpu type names to resource names/node selectors/tolerations, ... on the platform engineers. This would allow for great flexibility in how gpu types are controlled in different cloud providers/bare metal clusters, ...

3 replies

cosmicBboy Jun 22, 2023
Maintainer Author

would it be reasonable to ship with some opinionated helm charts on what the mappings would look like? of course it would have to specific to a particular cloud.

fg91 Jun 23, 2023
Collaborator

Yes, there should definitely be opinionated helm charts for the major cloud providers in the end 👍

I think as a first step we need an overview over which fields in the pod manifest are used by the different cloud providers to control the GPU type...

GKE labels the nodes with the gpu type so we will need to set a node selector/affinity
Alicloud labels the nodes with gpu name and also memory (see here).
I have also seen different resource names, e.g. "nvidia.com/mig-1g.5gb": 1 (here), but so far only for GPU sharing.

You use AWS a lot, right? Do you know how it works there?

michaeltinsley Aug 18, 2023

I don't think there is an equivalent default on EKS, however, this can either be manually added via node group labels.

Alternative solutions:

If using Karpenter these nodes should be available with the karpenter.k8s.aws/instance-gpu-name and karpenter.k8s.aws/instance-gpu-manufacturer labels - also worth noting that apparently AWS want to start pushing Karpenter as the default scaler for EKS sometime soon-ish
Another option is the Nvidia GPU Operator automatically labels nodes with the GPU type etc. The downside of this (for me at least) is that these labels don't exist if the node is scaled to zero.

I'm currently trying to find the best way of managing multiple GPU instance types in a cluster where all GPU nodes should be scaled to 0 when not in use. And currently, I think Karpenter is making the most sense - provided it works how I imagine it does 🤷

wild-endeavor · 2023-06-27T18:20:10Z

wild-endeavor
Jun 27, 2023
Maintainer

WIP comment
Supported

Resources(gpu="1", gpu_type="nvidia.com/mig-1g.5gb")
Resources(gpu="1", gpu_type="A100")

node affinity and tolerations

For fractional
Equal partitioning

resource type = "nvidia.com/gpu"
label on the node indicating the partition

For mixed partitioning

resource type = "nvidia.com/mig-<X>g.>Y>gb"
may be labeled, but shouldn't use label for targeting.

So even for "nvidia.com/gpu", if a user is selecting, use the same field gpu_type.

0 replies

davidmirror-ops · 2023-09-14T18:10:22Z

davidmirror-ops
Sep 14, 2023
Maintainer

09-14-2023 contributors meeting notes: OK to move to RFC

0 replies

wild-endeavor · 2023-09-15T21:18:36Z

wild-endeavor
Sep 15, 2023
Maintainer

Jeev has been working on this and the python ux has been updated slightly

# Specify T4 if your cluster doesn't have a default/has multiple types.
@task(accelerator=NvidiaTeslaT4)
def needs_t4(a: int):
    pass
# same with an a100
@task(accelerator=NvidiaTeslaA100)
def needs_a100(a: int):
    pass

# specify that you want a whole a100 (if you have some a100s partitioned and some not)
@task(accelerator=NvidiaTeslaA100.with_partition_size(None))
def needs_unpartitioned_a100(a: int):
    pass

# specify a specific a100 partition size (if you have multiple)
@task(accelerator=NvidiaTeslaA100.with_partition_size(NvidiaTeslaA100.partition_sizes.PARTITION_1G_5GB))
def needs_partitioned_a100(a: int):
    pass

4 replies

fg91 Sep 16, 2023
Collaborator

Does this mean that in flytekit, idl, and also in the backend, we maintain a set of existing accelerator types? Do we also maintain a fixed mapping of corresponding node selectors etc. for each cloud provider? If so, is this in code or in a configmap?

I'm wondering whether this is too inflexible for users that aren't working with the most prominent cloud providers or even on prem. How would they configure how to target a specific accelerator in their cluster? Or would we refer those users to not using the accelerator arg but instead a custom pod_template?

What do you think of moving acceleator into Resources(gpu="2", accelerator=NvidiaTeslaA100)?

fg91 Sep 16, 2023
Collaborator

Specifying e.g. Resources(gpu="1", gpu_type="A100") and in the helm values, as suggested above, having something like

gpu_types:
  - name: A100  # This can be any string
     nodeselector:
        cloud.google.com/gke-accelerator: nvidia-a100-80gb

in my opinion would be more flexible and wouldn't require adapting python/go code every time a new GPU becomes available but I acknowledge that the UX for users (and probably also error handling) might be a bit nicer if they don't specify gpu_type="A100" (as a string) but accelerator=NvidiaTeslaA100.

cosmicBboy Sep 16, 2023
Maintainer Author

+1 to using strings for flexibility, perhaps a more generic argument would be Resources(accelerator="A100"), which would also support accelerators like TPUs, etc.

How many partition size values would there be? Could those also be provided as a string, e.g. Resources(accelerator="A100-1b-5gb") or as an argument Resources(accelerator="A100", accelerator_partition_size="1g.5gb")

kumare3 Sep 17, 2023
Maintainer

I actually do not think we should provide partition as a special parameter- it is possible a new way of identifying gpu is possible- I think using just a cross product string values is better

davidmirror-ops · 2023-11-14T18:33:09Z

davidmirror-ops
Nov 14, 2023
Maintainer

2023-11-9 Contributor's meetup notes: implementation of this idea is already in progress.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU machine type selection #3796

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

GPU machine type selection #3796

cosmicBboy Jun 22, 2023 Maintainer

Use Case:

Replies: 5 comments · 7 replies

fg91 Jun 22, 2023 Collaborator

cosmicBboy Jun 22, 2023 Maintainer Author

fg91 Jun 23, 2023 Collaborator

michaeltinsley Aug 18, 2023

wild-endeavor Jun 27, 2023 Maintainer

davidmirror-ops Sep 14, 2023 Maintainer

wild-endeavor Sep 15, 2023 Maintainer

fg91 Sep 16, 2023 Collaborator

fg91 Sep 16, 2023 Collaborator

cosmicBboy Sep 16, 2023 Maintainer Author

kumare3 Sep 17, 2023 Maintainer

davidmirror-ops Nov 14, 2023 Maintainer

cosmicBboy
Jun 22, 2023
Maintainer

Replies: 5 comments 7 replies

fg91
Jun 22, 2023
Collaborator

cosmicBboy Jun 22, 2023
Maintainer Author

fg91 Jun 23, 2023
Collaborator

wild-endeavor
Jun 27, 2023
Maintainer

davidmirror-ops
Sep 14, 2023
Maintainer

wild-endeavor
Sep 15, 2023
Maintainer

fg91 Sep 16, 2023
Collaborator

fg91 Sep 16, 2023
Collaborator

cosmicBboy Sep 16, 2023
Maintainer Author

kumare3 Sep 17, 2023
Maintainer

davidmirror-ops
Nov 14, 2023
Maintainer