-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Kubernetes NMstate by default for selected platforms #747
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I don't think we can justify this simply on the basis that installing an optional operator is "inconvenient and cumbersome" In https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md I think we acknowledge that this configuration falls under the domain of the MCO, although we seem to be making a distinction about "only affecting post-kubelet". It also discusses only enabled the option C integration on bare-metal. I admit I'm confused. If we do see this configuration in the domain of the MCO, and if we would like this capability to be enabled by default on all clusters, then I'd like to see the problem framed as an extension of the previous enhancement - e.g. considering the option of kubernetes-nmstate as an MCO operand. |
Agreed we can perhaps be more specific about the user impact here, IIUC one of the reasons is so that secondary network configuration may be provided via NNCP manifests during initial deployment, vs requiring an optional add-on API and post-deploy configuration - @yboaron @celebdor do we have any more examples of the requirements driving this?
I believe option C was implemented, as a compromise because MCO is supposed to own all persistent config, but it lacks the features required to enable NIC configuration (particularly Machine specific config, the ability to apply some NIC config without rebooting the box, some kind of declarative API (e.g not just writing NM keyfiles), ability to rollback a bad config, etc. The issue then, given that this kind of configuration is very common for CNV/baremetal is that the implemented API isn't available by default in those environments.
Agreed, perhaps framing the problem as "deployed via CVO" is missing the bigger picture, which is that we've got a large API gap for on-premise users - we need a standard way to describe NIC config (for both secondary and controlplane networks, although currently the nmstate/NNCP integration only enables the former) |
569542e
to
910e655
Compare
910e655
to
4f5171c
Compare
Just to verify - how many instances will it run with? Any affect on single node openshift? |
4f5171c
to
7f0ceb5
Compare
Does nmstate have any support for e.g. wireguard? That could be useful even in cloud platforms. |
It does not support it now. We could open an RFE on nmstate to see if it is something they would consider adding. |
Just recording some questions that I think will help frame the enhancement for use in future discussions.
This would be important to emphasize if we expand this to more than "selected platfoms"
I am assuming the API would only be installed on selected platforms as a result of this enhancement.
This enhancement is asking for the capability to be versioned with OCP (in the release payload), but not enabled on all platforms. We need to evaluate which second level meta-domain operator makes the most sense to enable this, but I am against adding a net new second level operator.
To enable it by default, we should understand its impact to the control plane:
To enable it by default, we should understand its impact to each worker:
For single node
For externalized control planes
What is the knob to turn on/off the capability? This is related to the SLO chosen to deliver this new meta-operator and what configuration it reads to know to turn it on. For example, if this is for selected platforms, the component would read the |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you for raising these questions, a Q&A section was added to the enhancement. |
just a comment - OCS could make use of nmstate for MTU, bonding, a bunch of things that would significantly improve its performance if this was part of base system. @obnoxxx |
|
||
Now, the support status of NMState is moving from tech-preview to fully supported, the main consideration is how to conditionally enable this only on platforms where it's likely to be needed. | ||
|
||
If we can enable Kubernetes NMstate as part of the release payload, we can improve the user experience on platforms where it is required - for example allow for NodeNetworkConfigurationPolicy to be provided via manifests at install-time via openshift-installer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #753 we are addressing a similar need with the performance-addon-operator. In that case, we're exploring the option of running the operator's render
command outside of the installer between the create manifests
and create cluster
phases. Could a similar approach be used with kubernetes-nmstate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think that could work, if the needed containers are in the release payload then even if we require an "advanced" interface at the installer phase it's an improvement on the current process where everything has to happen day-2 via OLM. @yboaron would you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that sounds like a good direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't be adding the PAO containers to the release payload. Users who want to use workload partitioning will download the PAO separately (perhaps running it via podman).
|
||
### Goals | ||
|
||
- Provide NMState APIs by default on desired platforms, [NetworkManager configuration should be updated](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c-design) to support this capability on these platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading through this, it's not entirely clear what the "desired platforms" are. Is it just baremetal IPI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial target is just the baremetal platform, to support both CNV and general baremetal cases, but ref discussion above with @bengland2 there are other potential use-cases e.g OCS which wouldn't necessarily be specific to baremetal?
So I'd like to see this enabled by default for baremetal, with a way to opt-in for other platforms.
The other option that you mention above is to add the components to the release payload, but don't enable by default for any platform, then require some "advanced" customization step e.g manifest to enable (on any platform).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For OpenShift Virtualization it would be important too to have a way to opt-in on other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd certainly want this on baremetal, None, vSphere, and potentially ovirt. Really any where we're likely to encounter static network configuration which may need to be managed day2.
|
||
### Goals | ||
|
||
- Provide NMState APIs by default on desired platforms, [NetworkManager configuration should be updated](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c-design) to support this capability on these platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We state the 'desired platforms' all over this document but don't list what the desired platforms are.
Who has this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, lets get a list into the enhancement. Here's my suggestion #747 (comment)
|
||
**Q:** why NMSTATE capability should run everywhere (on-prem, cloud, edge) in all delivery models (self-managed, hosted) ? | ||
|
||
**A:** If it was supported/possible, it would have been better to deploy nmstate capability only in required platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we elaborate more on 'why its not possible'? I find this to be an answer without context.
|
||
If we can enable Kubernetes NMstate as part of the release payload, we can improve the user experience on platforms where it is required - for example allow for NodeNetworkConfigurationPolicy to be provided via manifests at install-time via openshift-installer. | ||
|
||
Additionally, having the Kubernetes NMstate as part of the release payload will allow uniformity between the various platforms (CNV, IPI-Baremetal). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these two operators run the risk of conflicting with one another because of this approach?
Eric, can we prioritize baremetal for NMState? That's the biggest OCS
need that I'm aware of, and involves some large customer accounts.
cc'ing Eran Tamir, OCS PM.
…On Thu, Jun 3, 2021 at 10:49 AM Eric Rich ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In enhancements/network/kubernetes-nmstate-operator-to-cvo.md
<#747 (comment)>
:
> +
+**A:** The desire is to enable a common enabled-by-default API for network configuration (initially only for selected platforms where that is commonly needed).
+
+If that approach is not acceptable then delivering via the release payload with some way to opt-in would be reasonable.
+
+
+**Q:** What is the knob to turn on/off the capability? This is related to the SLO chosen to deliver this new meta-operator and what configuration it reads to know to turn it on. For example, if this is for selected platforms, the component would read the Infrastructure CR to determine if it should enable it all, but is there another knob that would allow opt-in/out?
+
+**A:** The current proposal is to couple this to the platform detected in the Infrastructure CR, so it can be enabled by default on selected platforms (particularly baremetal).
+
+There could also be some other interface at the SLO level to override that behavior and/or to enable opt-in for platforms where it’s not enabled by default.
+
+
+### Goals
+
+- Provide NMState APIs by default on desired platforms, [NetworkManager configuration should be updated](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c-design) to support this capability on these platforms.
We state the 'desired platforms' all over this document but don't list
what the desired platforms are.
Who has this list?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#747 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDG4YSOBJHMLYIUGZS3FTTQ6I55ANCNFSM43K762GA>
.
|
|
||
Now, the support status of NMState is moving from tech-preview to fully supported, the main consideration is how to conditionally enable this only on platforms where it's likely to be needed. | ||
|
||
If we can enable Kubernetes NMstate as part of the release payload, we can improve the user experience on platforms where it is required - for example allow for NodeNetworkConfigurationPolicy to be provided via manifests at install-time via openshift-installer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't be adding the PAO containers to the release payload. Users who want to use workload partitioning will download the PAO separately (perhaps running it via podman).
|
||
Additionally, having the Kubernetes NMstate as part of the release payload will allow uniformity between the various platforms (CNV, IPI-Baremetal). | ||
|
||
Currently, CNV uses [Hyperconverged Cluster Operator - HCO](https://github.com/kubevirt/hyperconverged-cluster-operator) to deploy [custom operator - CNAO](https://github.com/kubevirt/cluster-network-addons-operator#nmstate), the CNAO installs Kubernetes NMstate, among other network related items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this approach also use OLM to install the operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. HCO and CNAO are both deployed through OLM. kubernetes-nmstate however is deployed directly by CNAO, without OLM. The reason is that we want a granular control over the lifecycle without a dependency on OLM as the U/S project targets vanilla Kubernetes environments too. It is worth mentioning that CNAO in this case deploys kubernetes-nmstate directly. The nmstate operator was introduced only recently to allow migration of kubernetes-nmstate under OpenShift - hence the dedicated operator decoupled from the rest of CNAO's networking.
|
||
**A:** For the tech-preview phase we have a lightweight controller that only runs a deployment (replica=1) and NMSTATE CRD definition , | ||
creating NMSTATE CR instance triggers the full API controller deployment. | ||
With this approach, we shouldn't have the full NMSTATE API everywhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would, though, because the CRD is installed and can be triggered. Is that appropriate for all platforms and deployment architectures?
### Goals | ||
|
||
- Provide NMState APIs by default on desired platforms, [NetworkManager configuration should be updated](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c-design) to support this capability on these platforms. | ||
- Allow using NMState APIs at install-time, where NIC config is a common requirement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a dumb question, but will install time configuration be persistent via the MCO? Compared to the ephemeral option-C above that is. Otherwise wouldn't it be lost after the first reboot that happens on all RHCOS nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Install-time configuration would be persisted via the kubernetes-nmstate CR. For day 1 we would pull the config out of that and apply it directly to the image, then for day 2 the operator would take over and allow more dynamic changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood the exact boundaries of responsibilities. The MCO is just writing a config setup for the NMstate operator to use? Or is the MCO not involved at all in this nice flow
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Network interface configuration is a frequent requirement for some platforms, particularly baremetal.
In recent releases work was completed (see [1]) to enable Kubernetes NMstate (optionally via OLM) so that declarative configuration of secondary network interfaces is possible.
To improve the user-experience it is desirable to enable the NMState API by default on selected platforms where such configuration is common.
Having Kubernetes NMstate deployed by CVO will allow for example secondary NIC configuration via NodeNetworkConfigurationPolicy manifests at install time
[1] https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md