CORS-3483: Update CAPI and CAPZ versions to set Machine DisableExtensionOperations #8627

sadasu · 2024-06-18T14:50:08Z

Updating CAPI to version v1.7.0 and CAPZ to v1.15.1-0.20240617212811-a52056dfb88c

The CAPZ version bump bring in kubernetes-sigs/cluster-api-provider-azure#4792

And set DisableExtensionOperations to true.

Co-authored-by: Aditya Narayanaswamy [email protected]

sadasu · 2024-06-18T14:55:03Z

/cc @jhixson74 , @patrickdillon, @rna-afk

I have rebased @rna-afk 's PR #8488 and added #8594 here.

r4f4

This is somehow changing CAPA and breaking the build:

 pkg/asset/machines/aws/awsmachines.go:119:21: awsMachine.Spec.ElasticIPPool undefined (type "sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2".AWSMachineSpec has no field or method ElasticIPPool)
pkg/asset/machines/aws/awsmachines.go:119:43: undefined: capa.ElasticIPPool
pkg/asset/machines/aws/awsmachines.go:121:47: undefined: capa.PublicIpv4PoolFallbackOrderAmazonPool
# github.com/openshift/installer/pkg/asset/machines/aws
# [github.com/openshift/installer/pkg/asset/machines/aws]
vet: pkg/asset/machines/aws/awsmachines.go:119:21: awsMachine.Spec.ElasticIPPool undefined (type v1beta2.AWSMachineSpec has no field or method ElasticIPPool)

r4f4 · 2024-06-18T15:32:50Z

Also capz is failing to build:

 go build -gcflags "" -ldflags "-s -w" -o ../../bin/linux_amd64/cluster-api-provider-azure "$path";
# k8s.io/apiserver/pkg/server/routes
vendor/k8s.io/apiserver/pkg/server/routes/openapi.go:69:89: cannot use oa.V3Config (variable of type *common.OpenAPIV3Config) as *common.Config value in argument to builder3.BuildOpenAPISpecFromRoutes

r4f4 · 2024-06-18T17:11:21Z

cluster-api/providers/azure/go.mod

 	sigs.k8s.io/yaml v1.4.0 // indirect
 )

-replace sigs.k8s.io/cluster-api => sigs.k8s.io/cluster-api v1.6.3
+replace sigs.k8s.io/cluster-api => sigs.k8s.io/cluster-api v1.7.3


To be safe, this should never be higher than the version of capi we're building (1.7.0 as of today).

Interesting. And I see @sadasu is fixing by bumping the CAPI controller in 4b4dd04 👍

Updating CAPI version at top level go.mod and cluster-api/cluster-api/go.mod.

But not updating it for other CAPI providers.

The version of CAPZ we are pulling in uses CAPI 1.7.3. I am not sure of the consequences if we just bumped CAPZ version and not CAPI version.

If CAPZ uses a new feature in 1.7.3, then running it against an older capi could cause runtime failures. I think what you're doing now is the correct approach. Let's just make sure to trigger jobs for all other providers and make sure they work with capi 1.7.3.

r4f4 · 2024-06-18T17:39:13Z

/test ?

openshift-ci · 2024-06-18T17:39:30Z

@r4f4: The following commands are available to trigger required jobs:

/test agent-integration-tests
/test altinfra-images
/test altinfra-periodics-images
/test aro-unit
/test e2e-agent-compact-ipv4
/test e2e-aws-ovn
/test e2e-aws-ovn-edge-zones-manifest-validation
/test e2e-aws-ovn-upi
/test e2e-azure-ovn
/test e2e-azure-ovn-upi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upi
/test e2e-metal-ipi-ovn-ipv6
/test e2e-openstack-ovn
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test gofmt
/test golint
/test govet
/test images
/test okd-images
/test okd-unit
/test okd-verify-codegen
/test openstack-manifests
/test shellcheck
/test terraform-images
/test terraform-verify-vendor
/test tf-lint
/test unit
/test verify-codegen
/test verify-vendor
/test yaml-lint

The following commands are available to trigger optional jobs:

/test altinfra-e2e-aws-custom-security-groups
/test altinfra-e2e-aws-ovn
/test altinfra-e2e-aws-ovn-fips
/test altinfra-e2e-aws-ovn-imdsv2
/test altinfra-e2e-aws-ovn-localzones
/test altinfra-e2e-aws-ovn-proxy
/test altinfra-e2e-aws-ovn-public-ipv4-pool
/test altinfra-e2e-aws-ovn-shared-vpc
/test altinfra-e2e-aws-ovn-shared-vpc-local-zones
/test altinfra-e2e-aws-ovn-shared-vpc-wavelength-zones
/test altinfra-e2e-aws-ovn-single-node
/test altinfra-e2e-aws-ovn-wavelengthzones
/test altinfra-e2e-azure-capi-ovn
/test altinfra-e2e-gcp-capi-ovn
/test altinfra-e2e-gcp-ovn-byo-network-capi
/test altinfra-e2e-gcp-ovn-secureboot-capi
/test altinfra-e2e-gcp-ovn-xpn-capi
/test altinfra-e2e-ibmcloud-capi-ovn
/test altinfra-e2e-nutanix-capi-ovn
/test altinfra-e2e-openstack-capi-ccpmso
/test altinfra-e2e-openstack-capi-ccpmso-zone
/test altinfra-e2e-openstack-capi-dualstack
/test altinfra-e2e-openstack-capi-dualstack-upi
/test altinfra-e2e-openstack-capi-dualstack-v6primary
/test altinfra-e2e-openstack-capi-externallb
/test altinfra-e2e-openstack-capi-nfv-intel
/test altinfra-e2e-openstack-capi-ovn
/test altinfra-e2e-openstack-capi-proxy
/test altinfra-e2e-powervs-capi-ovn
/test altinfra-e2e-vsphere-capi-multi-vcenter-ovn
/test altinfra-e2e-vsphere-capi-ovn
/test altinfra-e2e-vsphere-capi-static-ovn
/test altinfra-e2e-vsphere-capi-zones
/test azure-ovn-marketplace-images
/test e2e-agent-compact-ipv4-appliance-diskimage
/test e2e-agent-compact-ipv4-none-platform
/test e2e-agent-ha-dualstack
/test e2e-agent-sno-ipv4-pxe
/test e2e-agent-sno-ipv6
/test e2e-aws-overlay-mtu-ovn-1200
/test e2e-aws-ovn-edge-zones
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-imdsv2
/test e2e-aws-ovn-proxy
/test e2e-aws-ovn-public-subnets
/test e2e-aws-ovn-shared-vpc-custom-security-groups
/test e2e-aws-ovn-shared-vpc-edge-zones
/test e2e-aws-ovn-single-node
/test e2e-aws-ovn-upgrade
/test e2e-aws-ovn-workers-rhel8
/test e2e-aws-upi-proxy
/test e2e-azure-ovn-resourcegroup
/test e2e-azure-ovn-shared-vpc
/test e2e-azurestack
/test e2e-azurestack-upi
/test e2e-crc
/test e2e-external-aws
/test e2e-external-aws-ccm
/test e2e-gcp-ovn-byo-vpc
/test e2e-gcp-ovn-xpn
/test e2e-gcp-secureboot
/test e2e-gcp-upgrade
/test e2e-gcp-upi-xpn
/test e2e-ibmcloud-ovn
/test e2e-metal-assisted
/test e2e-metal-ipi-ovn
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-swapped-hosts
/test e2e-metal-ipi-ovn-virtualmedia
/test e2e-metal-single-node-live-iso
/test e2e-nutanix-ovn
/test e2e-openstack-ccpmso
/test e2e-openstack-ccpmso-zone
/test e2e-openstack-dualstack
/test e2e-openstack-dualstack-upi
/test e2e-openstack-externallb
/test e2e-openstack-nfv-intel
/test e2e-openstack-proxy
/test e2e-vsphere-ovn-upi-zones
/test e2e-vsphere-ovn-zones
/test e2e-vsphere-ovn-zones-techpreview
/test e2e-vsphere-static-ovn
/test okd-e2e-agent-compact-ipv4
/test okd-e2e-agent-ha-dualstack
/test okd-e2e-agent-sno-ipv6
/test okd-e2e-aws-ovn
/test okd-e2e-aws-ovn-upgrade
/test okd-e2e-gcp
/test okd-e2e-gcp-ovn-upgrade
/test okd-e2e-vsphere
/test okd-scos-images
/test tf-fmt

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-installer-master-altinfra-images
pull-ci-openshift-installer-master-altinfra-periodics-images
pull-ci-openshift-installer-master-aro-unit
pull-ci-openshift-installer-master-azure-ovn-marketplace-images
pull-ci-openshift-installer-master-e2e-aws-ovn
pull-ci-openshift-installer-master-e2e-azure-ovn
pull-ci-openshift-installer-master-e2e-azure-ovn-shared-vpc
pull-ci-openshift-installer-master-e2e-azurestack
pull-ci-openshift-installer-master-e2e-gcp-ovn
pull-ci-openshift-installer-master-e2e-gcp-ovn-byo-vpc
pull-ci-openshift-installer-master-e2e-gcp-ovn-xpn
pull-ci-openshift-installer-master-e2e-gcp-secureboot
pull-ci-openshift-installer-master-gofmt
pull-ci-openshift-installer-master-golint
pull-ci-openshift-installer-master-govet
pull-ci-openshift-installer-master-images
pull-ci-openshift-installer-master-okd-unit
pull-ci-openshift-installer-master-okd-verify-codegen
pull-ci-openshift-installer-master-shellcheck
pull-ci-openshift-installer-master-tf-fmt
pull-ci-openshift-installer-master-tf-lint
pull-ci-openshift-installer-master-unit
pull-ci-openshift-installer-master-verify-codegen
pull-ci-openshift-installer-master-verify-vendor
pull-ci-openshift-installer-master-yaml-lint

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

r4f4 · 2024-06-18T17:44:44Z

/test altinfra-e2e-aws-ovn altinfra-e2e-azure-capi-ovn altinfra-e2e-gcp-capi-ovn altinfra-e2e-ibmcloud-capi-ovn altinfra-e2e-nutanix-capi-ovn altinfra-e2e-openstack-capi-ovn altinfra-e2e-powervs-capi-ovn altinfra-e2e-vsphere-capi-ovn

r4f4 · 2024-06-18T18:34:49Z

ibmcloud-capi-ovn is expected to fail, I forgot about it.

openshift-ci-robot · 2024-06-18T18:35:24Z

@sadasu: This pull request references CORS-3483 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Updating CAPI to version v1.7.0 and CAPZ to v1.15.1-0.20240617212811-a52056dfb88c

The CAPZ version bump bring in kubernetes-sigs/cluster-api-provider-azure#4792

Co-authored-by: Aditya Narayanaswamy [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

sadasu · 2024-06-18T18:40:04Z

/jira refresh

openshift-ci-robot · 2024-06-18T18:40:08Z

@sadasu: This pull request references CORS-3483 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jhixson74 · 2024-06-18T18:51:17Z

/cc

r4f4 · 2024-06-18T19:01:18Z

e2e-aws-ovn, e2e-nutanix-capi-ovn, e2e-openstack-capi-ovn, e2e-vsphere-capi-ovn: cluster installed, so no problems with the capi bump.
Edit: e2e-gcp-capi-ovn: installed as well.
Edit2: e2e-powervs-capi-ovn: installed as well.

r4f4 · 2024-06-18T19:23:08Z

e2e-azure-capi-ovn: I see the following repeating error line in the logs:

time="2024-06-18T18:52:42Z" level=debug msg="E0618 18:52:42.254498    8173 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""

It'd be good to double check if it's a concern.

r4f4 · 2024-06-18T19:36:20Z

It's a LGTM if the changes are not negatively impacting azure.

patrickdillon · 2024-06-18T20:19:37Z

looking at the alintfra-e2e-azure-capi-ovn job infrastructure is not becoming ready.

I would have expected this to fail at waiting for machines, although looking at the job history, some previous jobs were actually able to complete installs. So seems like something is off in this PR to fail at this stage.

sadasu · 2024-06-18T21:56:22Z

looking at the alintfra-e2e-azure-capi-ovn job infrastructure is not becoming ready.

I would have expected this to fail at waiting for machines, although looking at the job history, some previous jobs were actually able to complete installs. So seems like something is off in this PR to fail at this stage.

Looking at the failure seen in this PR in isolation (not comparing it with previous failures), I see this:

level=info msg=Waiting up to 15m0s (until 6:33PM UTC) for network infrastructure to become ready...
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: infrastructure was not ready within 15m0s: client rate limiter Wait returned an error: context deadline exceeded
level=info msg=Shutting down local Cluster API control plane...

So, this error is generated here: kubernetes/client-go@147848c
Probable cause: during networking infrastructure provisioning, API calls to Azure are being rate limited and that is causing the 15min timer to run out

jhixson74 · 2024-06-19T03:17:31Z

e2e-azure-capi-ovn: I see the following repeating error line in the logs:

time="2024-06-18T18:52:42Z" level=debug msg="E0618 18:52:42.254498    8173 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""

It'd be good to double check if it's a concern.

Yup. This is definitely an issue. I spent some time debugging it but am not familiar enough with cluster-api to get to the bottom of it at this time. It looks like things aren't caching correctly or in time. I decided to try and pinpoint when the changes took place that broke this and determined that to be between v1.6.6 and v1.7.0. Unless there is a need to go to v1.7.0 right now, it would be nice to get this in at v1.6.6 and debug this later when we have more time ;-)

r4f4 · 2024-06-19T08:21:05Z

e2e-azure-capi-ovn: I see the following repeating error line in the logs:
time="2024-06-18T18:52:42Z" level=debug msg="E0618 18:52:42.254498    8173 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""
It'd be good to double check if it's a concern.
Yup. This is definitely an issue. I spent some time debugging it but am not familiar enough with cluster-api to get to the bottom of it at this time. It looks like things aren't caching correctly or in time. I decided to try and pinpoint when the changes took place that broke this and determined that to be between v1.6.6 and v1.7.0. Unless there is a need to go to v1.7.0 right now, it would be nice to get this in at v1.6.6 and debug this later when we have more time ;-)

Looking at this azure run from this PR I don't see that error line even though it's using capi 1.7.0. I would expect the issue to be between 1.7.0 and 1.7.3 if it's a capi problem. It must be some capi + capz combination.

From what I can tell, the error happens when running the azure controllers, specifically the AzureMachinePool:

time="2024-06-13T15:11:39Z" level=debug msg="I0613 15:11:39.333862    9018 controller.go:186] \"Starting Controller\" controller=\"azuremachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePool\""
time="2024-06-13T15:11:39Z" level=debug msg="I0613 15:11:39.335979    9018 reflector.go:351] Caches populated for *v1.Secret from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105"
time="2024-06-13T15:11:39Z" level=debug msg="I0613 15:11:39.338057    9018 reflector.go:351] Caches populated for *v1beta1.Cluster from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105"
time="2024-06-13T15:11:39Z" level=debug msg="I0613 15:11:39.338558    9018 reflector.go:351] Caches populated for *v1beta1.MachinePool from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105"
time="2024-06-13T15:11:39Z" level=debug msg="E0613 15:11:39.338605    9018 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""

That controller only runs if the machinePool feature gate is true which was changed to true in capi v1.7.0.

The worrisome part about this log is:

time="2024-06-13T15:13:39Z" level=debug msg="E0613 15:13:39.336109    9018 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""
time="2024-06-13T15:13:39Z" level=debug msg="E0613 15:13:39.437976    9018 controller.go:203] \"Could not wait for Cache to sync\" err=\"failed to wait for azuremachinepool caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.KubeadmConfig\" controller=\"azuremachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438049    9018 internal.go:516] \"Stopping and waiting for non leader election runnables\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438067    9018 internal.go:520] \"Stopping and waiting for leader election runnables\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438096    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438105    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azurecluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438112    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438121    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"ASOSecret\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438133    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438159    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremanagedmachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438170    9018 controller.go:242] \"All workers finished\" controller=\"azuremachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438174    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachinetemplate\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachineTemplate\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438184    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremanagedcontrolplane\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedControlPlane\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438195    9018 controller.go:242] \"All workers finished\" controller=\"azuremanagedcontrolplane\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedControlPlane\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438197    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachinepoolmachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePoolMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438208    9018 controller.go:242] \"All workers finished\" controller=\"azuremachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438215    9018 controller.go:242] \"All workers finished\" controller=\"azuremachinepoolmachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePoolMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438216    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremanagedcluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438224    9018 controller.go:242] \"All workers finished\" controller=\"azuremachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438226    9018 controller.go:242] \"All workers finished\" controller=\"azurecluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438230    9018 controller.go:242] \"All workers finished\" controller=\"azuremanagedcluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438232    9018 controller.go:242] \"All workers finished\" controller=\"azuremachinetemplate\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachineTemplate\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438245    9018 controller.go:242] \"All workers finished\" controller=\"azuremanagedmachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438248    9018 controller.go:242] \"All workers finished\" controller=\"ASOSecret\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438285    9018 internal.go:528] \"Stopping and waiting for caches\""
time="2024-06-13T15:13:39Z" level=debug msg="W0613 15:13:39.438375    9018 reflector.go:462] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: watch of *v1api20220701.NatGateway ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding"
time="2024-06-13T15:13:39Z" level=debug msg="W0613 15:13:39.438501    9018 reflector.go:462] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: watch of *v1beta1.Machine ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding"
time="2024-06-13T15:13:39Z" level=debug msg="W0613 15:13:39.438527    9018 reflector.go:462] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: watch of *v1api20201101.VirtualNetworksSubnet ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding"
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438636    9018 internal.go:532] \"Stopping and waiting for webhooks\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438682    9018 server.go:249] \"Shutting down webhook server with timeout of 1 minute\" logger=\"controller-runtime.webhook\""
time="2024-06-13T15:13:40Z" level=debug msg="I0613 15:13:40.493073    9018 internal.go:535] \"Stopping and waiting for HTTP servers\""
time="2024-06-13T15:13:40Z" level=debug msg="I0613 15:13:40.493121    9018 server.go:43] \"shutting down server\" kind=\"health probe\" addr=\"127.0.0.1:35841\""
time="2024-06-13T15:13:40Z" level=debug msg="I0613 15:13:40.493174    9018 internal.go:539] \"Wait completed, proceeding to shutdown the manager\""
time="2024-06-13T15:13:40Z" level=debug msg="E0613 15:13:40.493208    9018 main.go:353] \"problem running manager\" err=\"failed to wait for azuremachinepool caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.KubeadmConfig\" logger=\"setup\""
time="2024-06-13T15:26:49Z" level=debug msg="Collecting applied cluster api manifests..."

So azure controllers are shutting down (via a context cancel?) and nothing progresses for 13min until the Installer shuts envtest down at the 15min network infra timeout .

Setting the DisableExtensionOperations to true to reduce the chances of seeing timeout issues while provisioning VMs for Azure using CAPZ.

patrickdillon · 2024-06-21T14:46:42Z

/test altinfra-e2e-azure-capi-ovn

patrickdillon · 2024-06-21T15:58:23Z

I just tested this locally and it works!

DEBUG Time elapsed per stage:                      
DEBUG        Infrastructure Pre-provisioning: 1s   
DEBUG    Network-infrastructure Provisioning: 4m11s 
DEBUG Post-network, pre-machine Provisioning: 19m14s 
DEBUG        Bootstrap Ignition Provisioning: 1s   
DEBUG                   Machine Provisioning: 7m16s 
DEBUG       Infrastructure Post-provisioning: 23s  
DEBUG                     Bootstrap Complete: 15m16s 
DEBUG                                    API: 20s  
DEBUG                      Bootstrap Destroy: 56s  
DEBUG            Cluster Operators Available: 12m11s 
DEBUG               Cluster Operators Stable: 47s  
INFO Time elapsed: 1h0m30s

Let's try to get this in as soon as possible, because it does not look fun to rebase this...

/approve

patrickdillon · 2024-06-21T15:58:46Z

/retest-required

openshift-ci · 2024-06-21T16:05:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [patrickdillon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

patrickdillon · 2024-06-21T16:08:04Z

/test altinfra-e2e-gcp-capi-ovn altinfra-e2e-ibmcloud-capi-ovn altinfra-e2e-nutanix-capi-ovn altinfra-e2e-openstack-capi-ovn altinfra-e2e-powervs-capi-ovn altinfra-e2e-vsphere-capi-ovn

patrickdillon · 2024-06-21T17:07:29Z

/test e2e-aws-ovn e2e-vsphere-ovn

patrickdillon · 2024-06-21T17:08:00Z

�[36mINFO�[0m[2024-06-21T15:15:35Z] Running step e2e-azure-capi-ovn-ipi-install-install. 
�[36mINFO�[0m[2024-06-21T16:21:53Z] Step e2e-azure-capi-ovn-ipi-install-install succeeded after 1h6m18s.

azure install has succeeded. just waiting for e2e tests.

r4f4

/lgtm

openshift-ci-robot · 2024-06-21T17:25:10Z

/retest-required

Remaining retests: 0 against base HEAD 29fd804 and 2 for PR HEAD beab2da in total

sadasu · 2024-06-21T17:44:33Z

/retest-required

jhixson74 · 2024-06-21T17:48:45Z

I just tested this locally and it works!

DEBUG Time elapsed per stage:                      
DEBUG        Infrastructure Pre-provisioning: 1s   
DEBUG    Network-infrastructure Provisioning: 4m11s 
DEBUG Post-network, pre-machine Provisioning: 19m14s 
DEBUG        Bootstrap Ignition Provisioning: 1s   
DEBUG                   Machine Provisioning: 7m16s 
DEBUG       Infrastructure Post-provisioning: 23s  
DEBUG                     Bootstrap Complete: 15m16s 
DEBUG                                    API: 20s  
DEBUG                      Bootstrap Destroy: 56s  
DEBUG            Cluster Operators Available: 12m11s 
DEBUG               Cluster Operators Stable: 47s  
INFO Time elapsed: 1h0m30s

Let's try to get this in as soon as possible, because it does not look fun to rebase this...

/approve

Confirmed here as well.

sadasu · 2024-06-21T18:40:59Z

/retest-required

patrickdillon · 2024-06-21T22:35:39Z

/skip

openshift-ci-robot · 2024-06-22T03:12:58Z

/retest-required

Remaining retests: 0 against base HEAD bf24c3e and 1 for PR HEAD beab2da in total

openshift-ci · 2024-06-22T05:45:31Z

@sadasu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-edge-zones	`d423690`	link	false	`/test e2e-aws-ovn-edge-zones`
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones	`d423690`	link	false	`/test e2e-aws-ovn-shared-vpc-edge-zones`
ci/prow/e2e-aws-ovn-shared-vpc-custom-security-groups	`d423690`	link	false	`/test e2e-aws-ovn-shared-vpc-custom-security-groups`
ci/prow/e2e-aws-ovn-fips	`d423690`	link	false	`/test e2e-aws-ovn-fips`
ci/prow/e2e-aws-ovn-single-node	`d423690`	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/altinfra-e2e-aws-ovn	`50d82a6`	link	false	`/test altinfra-e2e-aws-ovn`
ci/prow/e2e-aws-ovn-edge-zones-manifest-validation	`d423690`	link	true	`/test e2e-aws-ovn-edge-zones-manifest-validation`
ci/prow/e2e-aws-ovn-imdsv2	`d423690`	link	false	`/test e2e-aws-ovn-imdsv2`
ci/prow/e2e-external-aws-ccm	`d423690`	link	false	`/test e2e-external-aws-ccm`
ci/prow/altinfra-e2e-vsphere-capi-ovn	`beab2da`	link	false	`/test altinfra-e2e-vsphere-capi-ovn`
ci/prow/e2e-gcp-ovn-byo-vpc	`beab2da`	link	false	`/test e2e-gcp-ovn-byo-vpc`
ci/prow/e2e-gcp-ovn-xpn	`beab2da`	link	false	`/test e2e-gcp-ovn-xpn`
ci/prow/azure-ovn-marketplace-images	`beab2da`	link	false	`/test azure-ovn-marketplace-images`
ci/prow/e2e-gcp-secureboot	`beab2da`	link	false	`/test e2e-gcp-secureboot`
ci/prow/altinfra-e2e-ibmcloud-capi-ovn	`beab2da`	link	false	`/test altinfra-e2e-ibmcloud-capi-ovn`
ci/prow/altinfra-e2e-powervs-capi-ovn	`beab2da`	link	false	`/test altinfra-e2e-powervs-capi-ovn`
ci/prow/e2e-azure-ovn-shared-vpc	`beab2da`	link	false	`/test e2e-azure-ovn-shared-vpc`
ci/prow/e2e-azurestack	`beab2da`	link	false	`/test e2e-azurestack`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-bot · 2024-06-22T09:05:18Z

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-installer-altinfra-container-v4.17.0-202406220812.p0.g27d9113.assembly.stream.el9 for distgit ose-installer-altinfra.
All builds following this will include this PR.

openshift-ci bot requested review from bfournie and patrickdillon June 18, 2024 15:16

r4f4 suggested changes Jun 18, 2024

View reviewed changes

sadasu changed the title ~~Updating CAPI and CAPZ versions~~ WIP: Updating CAPI and CAPZ versions Jun 18, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2024

sadasu force-pushed the azure-cors-3483 branch from d423690 to 106763a Compare June 18, 2024 16:57

r4f4 reviewed Jun 18, 2024

View reviewed changes

sadasu force-pushed the azure-cors-3483 branch from 106763a to 50d82a6 Compare June 18, 2024 17:13

sadasu changed the title ~~WIP: Updating CAPI and CAPZ versions~~ CORS-3483: Updating CAPI and CAPZ versions Jun 18, 2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 18, 2024

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2024

sadasu mentioned this pull request Jun 18, 2024

CORS-3483: azure: Update CAPZ version to 1.15.1 #8488

Closed

openshift-ci bot requested a review from jhixson74 June 18, 2024 18:51

sadasu added 8 commits June 21, 2024 09:35

Updating the versions of CAPI and CAPZ within CAPI provider Azure

332a55b

Updating azure-infrastructure-components.yaml with version bump

1d1197c

Vendoring updates accompanying cluster-api/providers/azure changes

0544a3e

Update CAPI and CAPZ versions in installer/go.mod

0860ce7

Vendoring changes accompanying installer/go.mod changes

0fb5e0b

Update CAPI version in cluster-api/cluster-api/go.mod

910edd7

Vendoring updates with CAPI version bump in cluster-api/cluster-api

6655196

azure: Set DisableExtensionOperations to avoid timeout issues

beab2da

Setting the DisableExtensionOperations to true to reduce the chances of seeing timeout issues while provisioning VMs for Azure using CAPZ.

sadasu force-pushed the azure-cors-3483 branch from 268376b to beab2da Compare June 21, 2024 14:37

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 21, 2024

r4f4 reviewed Jun 21, 2024

View reviewed changes

openshift-ci bot assigned r4f4 Jun 21, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 21, 2024

openshift-merge-bot bot merged commit 27d9113 into openshift:master Jun 22, 2024
33 checks passed

CORS-3483: Update CAPI and CAPZ versions to set Machine DisableExtensionOperations #8627

CORS-3483: Update CAPI and CAPZ versions to set Machine DisableExtensionOperations #8627

Conversation

sadasu commented Jun 18, 2024 • edited Loading

sadasu commented Jun 18, 2024 • edited Loading

r4f4 left a comment

Choose a reason for hiding this comment

r4f4 commented Jun 18, 2024

r4f4 Jun 18, 2024

Choose a reason for hiding this comment

patrickdillon Jun 18, 2024

Choose a reason for hiding this comment

sadasu Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

r4f4 Jun 18, 2024

Choose a reason for hiding this comment

r4f4 commented Jun 18, 2024

openshift-ci bot commented Jun 18, 2024

r4f4 commented Jun 18, 2024

r4f4 commented Jun 18, 2024

openshift-ci-robot commented Jun 18, 2024 • edited by openshift-ci bot Loading

sadasu commented Jun 18, 2024

openshift-ci-robot commented Jun 18, 2024 • edited by openshift-ci bot Loading

jhixson74 commented Jun 18, 2024

r4f4 commented Jun 18, 2024 • edited Loading

r4f4 commented Jun 18, 2024

r4f4 commented Jun 18, 2024

patrickdillon commented Jun 18, 2024 • edited Loading

sadasu commented Jun 18, 2024

jhixson74 commented Jun 19, 2024

r4f4 commented Jun 19, 2024

patrickdillon commented Jun 21, 2024

patrickdillon commented Jun 21, 2024

patrickdillon commented Jun 21, 2024

openshift-ci bot commented Jun 21, 2024

patrickdillon commented Jun 21, 2024

patrickdillon commented Jun 21, 2024

patrickdillon commented Jun 21, 2024

r4f4 left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jun 21, 2024

sadasu commented Jun 21, 2024

jhixson74 commented Jun 21, 2024

sadasu commented Jun 21, 2024

patrickdillon commented Jun 21, 2024

openshift-ci-robot commented Jun 22, 2024

openshift-ci bot commented Jun 22, 2024

openshift-bot commented Jun 22, 2024

sadasu commented Jun 18, 2024 •

edited

Loading

sadasu commented Jun 18, 2024 •

edited

Loading

sadasu Jun 18, 2024 •

edited

Loading

openshift-ci-robot commented Jun 18, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jun 18, 2024 •

edited by openshift-ci bot

Loading

r4f4 commented Jun 18, 2024 •

edited

Loading

patrickdillon commented Jun 18, 2024 •

edited

Loading