-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: KEP-4381: DRA: support vendor-independent attributes #4614
Conversation
While it isn't really feasible for GPUs, other kinds of hardware might be described via some standardized attributes. When removing the explicit driver name from requests and filters it becomes possible for users to ask for an instance where some standardized attribute is set without knowing in advance which driver is going to provide it. The downside is that CEL expressions now must get applied to all instances, whether they are from the "expected" driver or some other one. It becomes the responsibility of the CEL expression to filter out unknown instances.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -697,43 +699,51 @@ generatedFrom: | |||
apiGroup: dra.example.com | |||
uid: foobar-uid | |||
|
|||
vendorParameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was at the wrong level before. I'm just fixing this example here.
# Each entry here is a request for one resource. Entries could have | ||
# their own vendor parameters (not shown here). | ||
subRequests: | ||
- namedResources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, "requests" inside "driverRequests" made some sense because all "requests" were related to a single driver. Now the only remaining reason is that we need a place to put vendor parameters that are meant for more than one individual resource request.
@klueska: do you think that we can perhaps flatten this after all? Then a claim would have only "requests" where each request has vendor parameters and the selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm having a hard time paging this back in. What are you proposing exactly? Can you show me a before and after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before is what this KEP has:
requests:
- vendorParameters: <parameters for all sub-requests>
subRequests:
- vendorParameters: <parameters for device A>
namedResources:
<device A>
- vendorParameters: <parameters for device B>
namedResources:
<device B>
If we were to drop one level, that would become:
requests:
- vendorParameters: <parameters for device A>
namedResources:
<device A>
- vendorParameters: <parameters for device B>
namedResources:
<device B>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With one level removed, "common" parameters would need to be repeated for each device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or set once and then applied to all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop one level of indirection by moving the list of sub-requests into the model. When combined with renamed "named resources" model to "device" model (see Slack), the result looks fairly natural to me.
Here I am also using a one-of-many for the parameters because instead of "vendor" setup parameters we might at some point also have "in-tree" setup parameters.
In addition, I am including the driver name for the vendor parameters. That enables validation with an admission webhook and providing setup parameters for different drivers, which can happen now when the CEL expression allows it
requests:
- setup:
vendor:
driverName: cards.dra.example.com
parameters: <parameters for devices A and B>
devices:
- selector: <selector for device A>
setup:
vendor:
driverName: cards.dra.example.com
parameters: <parameters for device A>
- selector: selector for device B>
setup:
vendor:
driverName: cards.dra.example.com
parameters: <parameters for device B>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, we don't even need the top-level "requests".
See the update that I just pushed.
# their own vendor parameters (not shown here). | ||
subRequests: | ||
- namedResources: | ||
driverName: cards.dra.example.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key conceptual change: the driver name is no longer required at the level above "named resources". Instead, instances of "the driver I want to use" are selected via the CEL expression.
This driverName
is just there to simplify the names of attributes in the CEL expression. I'm not sure whether this is important enough to justify the extra complexity.
f55fdbb
to
977e4d6
Compare
- name: type | ||
string: GPU # All named resources with this type have the following attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this implying that type
becomes a special attribute that the scheduler / other abstractions need to know exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It's defined by the vendor and exists inside the (in this example) "cards.dra.example.com" namespace.
attributes.quantity["memory"].isGreaterThan(quantity("32Gi")) | ||
``` | ||
namedResources: | ||
- attributeSuffix: cards.dra.example.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this. Does this mean that when i define my resource slice that I have to name my attributes with a particular suffix to match here? Or is this the driver name appended to the end of the attributes I define in my resourceslice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attribute names get expanded in two places:
- In ResourceSlice, the driver name gets appended.
- In a selector, either the user already uses the fully-qualified name or the "attributeSuffix" gets appended.
I am open for dropping the "attributeSuffix" field. It helps with keeping CEL expressions shorter, but it also makes the API harder to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also drop the automatism for ResourceSlice and make it a validation requirement that attribute names are fully-qualified. The ResourceSlice object then becomes larger, but more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don't want to be that verbose when generating a resource slice. It also feels weird (at least to me) to encode the driver name in the attribute string at all (regardless of the level at which its happening). Would an alternative be to have an optional field for driver
at the same level as name/<attribute-type> in cases where you want to match on exactly that driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don't want to be that verbose when generating a resource slice.
So we keep what I have for ResourceSlice now, with short names that inherit the driver name as suffix.
It also feels weird (at least to me) to encode the driver name in the attribute string at all
The problem is that we want to enable selectors that work across different vendors. The attributes then could come from a k8s.io
scope (namespace?) or from the scopes of two different vendors, if that is what the user wants: "select a card of vendor A with these attributes or a card of vendor B with these attributes".
Would an alternative be to have an optional field for driver at the same level as name/ in cases where you want to match on exactly that driver?
Field where? The CEL expression is a single string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get what you meant. We can rename "attributeSuffix" to "driverName" and change the meaning to "first filter by that name, then check the CEL expression". Should be more efficient, too.
Update pushed.
# At the moment, only setup parameters defined by a vendor | ||
# are supported. In-tree definition of common setup parameters | ||
# might get added in the future. | ||
setup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/kubernetes-sigs/wg-device-management/blob/5c4dd792fb42f881ec19c986291131f6c97ce249/k8srm-prototype/pkg/api/claim_types.go#L200 used "config(s)". Let's switch to that.
198cafb
to
03c177f
Compare
NamedResources *NamedResourcesRequest | ||
// The list may be empty, in which case the claim can be allocated without | ||
// using any instances. | ||
NamedResources *[]NamedResourcesRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh!
The gogo protobuf code generator fails for this. It generates len(NamedResources)
which does not compile because NamedResources
is a pointer. Not sure yet how to resolve this.
I can wrap []NamedResourcesRequest
in a struct, but the the YAML and JSON end up with one additional indirection.
This is not a show-stopper for "vendor-independent attributes", but it throws a monkey wrench into the attempt to flatten the YAML at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this look?
requests:
- device:
selector: <CEL expression for named resources>
config:
vendor:
driverName: cards.dra.example.com
parameters: <per-request parameters>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
request:
- device:
selector: <CEL expression for named resources>
config:
vendor:
driverName: cards.dra.example.com
parameters: <per-request parameters>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both is assuming that "named resources" gets renamed to "device". For now, it would have "namedResource".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a convention to use plural form for lists?
I'm wondering about request
(imperative) vs requests
(noun, plural).
@pohly: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This proposal takes the existing KEP as base and includes: - vendor-independent classes and attributes (kubernetes/enhancements#4614) - optional allocation (kubernetes/enhancements#4619) - inline parameters (kubernetes/enhancements#4613) - management access (kubernetes/enhancements#4611) - renaming "named resources" to "devices" wherever it makes sense and is user-facing (Slack discussion) - MatchAttributes (from k8srm-prototype) - OneOf (from k8srm-prototype) `pkg/api` currently builds, but the rest doesn't. None of the YAML examples have been updated yet.
This proposal takes the existing KEP as base and includes: - vendor-independent classes and attributes (kubernetes/enhancements#4614) - optional allocation (kubernetes/enhancements#4619) - inline parameters (kubernetes/enhancements#4613) - management access (kubernetes/enhancements#4611) - renaming "named resources" to "devices" wherever it makes sense and is user-facing (Slack discussion) - MatchAttributes (from k8srm-prototype) - OneOf (from k8srm-prototype) `pkg/api` currently builds, but the rest doesn't. None of the YAML examples have been updated yet.
This proposal takes the existing KEP as base and includes: - vendor-independent classes and attributes (kubernetes/enhancements#4614) - optional allocation (kubernetes/enhancements#4619) - inline parameters (kubernetes/enhancements#4613) - management access (kubernetes/enhancements#4611) - renaming "named resources" to "devices" wherever it makes sense and is user-facing (Slack discussion) - MatchAttributes (from k8srm-prototype) - OneOf (from k8srm-prototype) `pkg/api` currently builds, but the rest doesn't. None of the YAML examples have been updated yet.
This proposal takes the existing KEP as base and includes: - vendor-independent classes and attributes (kubernetes/enhancements#4614) - optional allocation (kubernetes/enhancements#4619) - inline parameters (kubernetes/enhancements#4613) - management access (kubernetes/enhancements#4611) - renaming "named resources" to "devices" wherever it makes sense and is user-facing (Slack discussion) - MatchAttributes (from k8srm-prototype) - OneOf (from k8srm-prototype) `pkg/api` currently builds, but the rest doesn't. None of the YAML examples have been updated yet.
This proposal takes the existing KEP as base and includes: - vendor-independent classes and attributes (kubernetes/enhancements#4614) - optional allocation (kubernetes/enhancements#4619) - inline parameters (kubernetes/enhancements#4613) - management access (kubernetes/enhancements#4611) - renaming "named resources" to "devices" wherever it makes sense and is user-facing (Slack discussion) - MatchAttributes (from k8srm-prototype) - OneOf (from k8srm-prototype) `pkg/api` currently builds, but the rest doesn't. None of the YAML examples have been updated yet.
This proposal takes the existing KEP as base and includes: - vendor-independent classes and attributes (kubernetes/enhancements#4614) - optional allocation (kubernetes/enhancements#4619) - inline parameters (kubernetes/enhancements#4613) - management access (kubernetes/enhancements#4611) - renaming "named resources" to "devices" wherever it makes sense and is user-facing (Slack discussion) - MatchAttributes (from k8srm-prototype) - OneOf (from k8srm-prototype) `pkg/api` currently builds, but the rest doesn't. None of the YAML examples have been updated yet.
/close Will become part of a future KEP updated after prototyping it in https://github.com/kubernetes-sigs/wg-device-management. |
@pohly: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
One-line PR description: vendor-independent attributes
Issue link: DRA: structured parameters #4381
Other comments:
While it isn't really feasible for GPUs, other kinds of hardware might be described via some standardized attributes. When removing the explicit driver name from requests and filters it becomes possible for users to ask for an instance where some standardized attribute is set without knowing in advance which driver is going to provide it.
The downside is that CEL expressions now must get applied to all instances, whether they are from the "expected" driver or some other one. It becomes the responsibility of the CEL expression to filter out unknown instances.