Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORS-3483: Update CAPI and CAPZ versions to set Machine DisableExtensionOperations #8627

Merged
merged 8 commits into from
Jun 22, 2024

Conversation

sadasu
Copy link
Contributor

@sadasu sadasu commented Jun 18, 2024

Updating CAPI to version v1.7.0 and CAPZ to v1.15.1-0.20240617212811-a52056dfb88c

The CAPZ version bump bring in kubernetes-sigs/cluster-api-provider-azure#4792

And set DisableExtensionOperations to true.

Co-authored-by: Aditya Narayanaswamy [email protected]

@sadasu
Copy link
Contributor Author

sadasu commented Jun 18, 2024

/cc @jhixson74 , @patrickdillon, @rna-afk

I have rebased @rna-afk 's PR #8488 and added #8594 here.

Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somehow changing CAPA and breaking the build:

 pkg/asset/machines/aws/awsmachines.go:119:21: awsMachine.Spec.ElasticIPPool undefined (type "sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2".AWSMachineSpec has no field or method ElasticIPPool)
pkg/asset/machines/aws/awsmachines.go:119:43: undefined: capa.ElasticIPPool
pkg/asset/machines/aws/awsmachines.go:121:47: undefined: capa.PublicIpv4PoolFallbackOrderAmazonPool
# github.com/openshift/installer/pkg/asset/machines/aws
# [github.com/openshift/installer/pkg/asset/machines/aws]
vet: pkg/asset/machines/aws/awsmachines.go:119:21: awsMachine.Spec.ElasticIPPool undefined (type v1beta2.AWSMachineSpec has no field or method ElasticIPPool) 

@r4f4
Copy link
Contributor

r4f4 commented Jun 18, 2024

Also capz is failing to build:

 go build -gcflags "" -ldflags "-s -w" -o ../../bin/linux_amd64/cluster-api-provider-azure "$path";
# k8s.io/apiserver/pkg/server/routes
vendor/k8s.io/apiserver/pkg/server/routes/openapi.go:69:89: cannot use oa.V3Config (variable of type *common.OpenAPIV3Config) as *common.Config value in argument to builder3.BuildOpenAPISpecFromRoutes 

@sadasu sadasu changed the title Updating CAPI and CAPZ versions WIP: Updating CAPI and CAPZ versions Jun 18, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2024
sigs.k8s.io/yaml v1.4.0 // indirect
)

replace sigs.k8s.io/cluster-api => sigs.k8s.io/cluster-api v1.6.3
replace sigs.k8s.io/cluster-api => sigs.k8s.io/cluster-api v1.7.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be safe, this should never be higher than the version of capi we're building (1.7.0 as of today).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. And I see @sadasu is fixing by bumping the CAPI controller in 4b4dd04 👍

Copy link
Contributor Author

@sadasu sadasu Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating CAPI version at top level go.mod and cluster-api/cluster-api/go.mod.

But not updating it for other CAPI providers.

The version of CAPZ we are pulling in uses CAPI 1.7.3. I am not sure of the consequences if we just bumped CAPZ version and not CAPI version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CAPZ uses a new feature in 1.7.3, then running it against an older capi could cause runtime failures. I think what you're doing now is the correct approach. Let's just make sure to trigger jobs for all other providers and make sure they work with capi 1.7.3.

@r4f4
Copy link
Contributor

r4f4 commented Jun 18, 2024

/test ?

Copy link
Contributor

openshift-ci bot commented Jun 18, 2024

@r4f4: The following commands are available to trigger required jobs:

  • /test agent-integration-tests
  • /test altinfra-images
  • /test altinfra-periodics-images
  • /test aro-unit
  • /test e2e-agent-compact-ipv4
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-edge-zones-manifest-validation
  • /test e2e-aws-ovn-upi
  • /test e2e-azure-ovn
  • /test e2e-azure-ovn-upi
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-upi
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack-ovn
  • /test e2e-vsphere-ovn
  • /test e2e-vsphere-ovn-upi
  • /test gofmt
  • /test golint
  • /test govet
  • /test images
  • /test okd-images
  • /test okd-unit
  • /test okd-verify-codegen
  • /test openstack-manifests
  • /test shellcheck
  • /test terraform-images
  • /test terraform-verify-vendor
  • /test tf-lint
  • /test unit
  • /test verify-codegen
  • /test verify-vendor
  • /test yaml-lint

The following commands are available to trigger optional jobs:

  • /test altinfra-e2e-aws-custom-security-groups
  • /test altinfra-e2e-aws-ovn
  • /test altinfra-e2e-aws-ovn-fips
  • /test altinfra-e2e-aws-ovn-imdsv2
  • /test altinfra-e2e-aws-ovn-localzones
  • /test altinfra-e2e-aws-ovn-proxy
  • /test altinfra-e2e-aws-ovn-public-ipv4-pool
  • /test altinfra-e2e-aws-ovn-shared-vpc
  • /test altinfra-e2e-aws-ovn-shared-vpc-local-zones
  • /test altinfra-e2e-aws-ovn-shared-vpc-wavelength-zones
  • /test altinfra-e2e-aws-ovn-single-node
  • /test altinfra-e2e-aws-ovn-wavelengthzones
  • /test altinfra-e2e-azure-capi-ovn
  • /test altinfra-e2e-gcp-capi-ovn
  • /test altinfra-e2e-gcp-ovn-byo-network-capi
  • /test altinfra-e2e-gcp-ovn-secureboot-capi
  • /test altinfra-e2e-gcp-ovn-xpn-capi
  • /test altinfra-e2e-ibmcloud-capi-ovn
  • /test altinfra-e2e-nutanix-capi-ovn
  • /test altinfra-e2e-openstack-capi-ccpmso
  • /test altinfra-e2e-openstack-capi-ccpmso-zone
  • /test altinfra-e2e-openstack-capi-dualstack
  • /test altinfra-e2e-openstack-capi-dualstack-upi
  • /test altinfra-e2e-openstack-capi-dualstack-v6primary
  • /test altinfra-e2e-openstack-capi-externallb
  • /test altinfra-e2e-openstack-capi-nfv-intel
  • /test altinfra-e2e-openstack-capi-ovn
  • /test altinfra-e2e-openstack-capi-proxy
  • /test altinfra-e2e-powervs-capi-ovn
  • /test altinfra-e2e-vsphere-capi-multi-vcenter-ovn
  • /test altinfra-e2e-vsphere-capi-ovn
  • /test altinfra-e2e-vsphere-capi-static-ovn
  • /test altinfra-e2e-vsphere-capi-zones
  • /test azure-ovn-marketplace-images
  • /test e2e-agent-compact-ipv4-appliance-diskimage
  • /test e2e-agent-compact-ipv4-none-platform
  • /test e2e-agent-ha-dualstack
  • /test e2e-agent-sno-ipv4-pxe
  • /test e2e-agent-sno-ipv6
  • /test e2e-aws-overlay-mtu-ovn-1200
  • /test e2e-aws-ovn-edge-zones
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-imdsv2
  • /test e2e-aws-ovn-proxy
  • /test e2e-aws-ovn-public-subnets
  • /test e2e-aws-ovn-shared-vpc-custom-security-groups
  • /test e2e-aws-ovn-shared-vpc-edge-zones
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-upi-proxy
  • /test e2e-azure-ovn-resourcegroup
  • /test e2e-azure-ovn-shared-vpc
  • /test e2e-azurestack
  • /test e2e-azurestack-upi
  • /test e2e-crc
  • /test e2e-external-aws
  • /test e2e-external-aws-ccm
  • /test e2e-gcp-ovn-byo-vpc
  • /test e2e-gcp-ovn-xpn
  • /test e2e-gcp-secureboot
  • /test e2e-gcp-upgrade
  • /test e2e-gcp-upi-xpn
  • /test e2e-ibmcloud-ovn
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi-ovn
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-swapped-hosts
  • /test e2e-metal-ipi-ovn-virtualmedia
  • /test e2e-metal-single-node-live-iso
  • /test e2e-nutanix-ovn
  • /test e2e-openstack-ccpmso
  • /test e2e-openstack-ccpmso-zone
  • /test e2e-openstack-dualstack
  • /test e2e-openstack-dualstack-upi
  • /test e2e-openstack-externallb
  • /test e2e-openstack-nfv-intel
  • /test e2e-openstack-proxy
  • /test e2e-vsphere-ovn-upi-zones
  • /test e2e-vsphere-ovn-zones
  • /test e2e-vsphere-ovn-zones-techpreview
  • /test e2e-vsphere-static-ovn
  • /test okd-e2e-agent-compact-ipv4
  • /test okd-e2e-agent-ha-dualstack
  • /test okd-e2e-agent-sno-ipv6
  • /test okd-e2e-aws-ovn
  • /test okd-e2e-aws-ovn-upgrade
  • /test okd-e2e-gcp
  • /test okd-e2e-gcp-ovn-upgrade
  • /test okd-e2e-vsphere
  • /test okd-scos-images
  • /test tf-fmt

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-installer-master-altinfra-images
  • pull-ci-openshift-installer-master-altinfra-periodics-images
  • pull-ci-openshift-installer-master-aro-unit
  • pull-ci-openshift-installer-master-azure-ovn-marketplace-images
  • pull-ci-openshift-installer-master-e2e-aws-ovn
  • pull-ci-openshift-installer-master-e2e-azure-ovn
  • pull-ci-openshift-installer-master-e2e-azure-ovn-shared-vpc
  • pull-ci-openshift-installer-master-e2e-azurestack
  • pull-ci-openshift-installer-master-e2e-gcp-ovn
  • pull-ci-openshift-installer-master-e2e-gcp-ovn-byo-vpc
  • pull-ci-openshift-installer-master-e2e-gcp-ovn-xpn
  • pull-ci-openshift-installer-master-e2e-gcp-secureboot
  • pull-ci-openshift-installer-master-gofmt
  • pull-ci-openshift-installer-master-golint
  • pull-ci-openshift-installer-master-govet
  • pull-ci-openshift-installer-master-images
  • pull-ci-openshift-installer-master-okd-unit
  • pull-ci-openshift-installer-master-okd-verify-codegen
  • pull-ci-openshift-installer-master-shellcheck
  • pull-ci-openshift-installer-master-tf-fmt
  • pull-ci-openshift-installer-master-tf-lint
  • pull-ci-openshift-installer-master-unit
  • pull-ci-openshift-installer-master-verify-codegen
  • pull-ci-openshift-installer-master-verify-vendor
  • pull-ci-openshift-installer-master-yaml-lint

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@r4f4
Copy link
Contributor

r4f4 commented Jun 18, 2024

/test altinfra-e2e-aws-ovn altinfra-e2e-azure-capi-ovn altinfra-e2e-gcp-capi-ovn altinfra-e2e-ibmcloud-capi-ovn altinfra-e2e-nutanix-capi-ovn altinfra-e2e-openstack-capi-ovn altinfra-e2e-powervs-capi-ovn altinfra-e2e-vsphere-capi-ovn

@r4f4
Copy link
Contributor

r4f4 commented Jun 18, 2024

ibmcloud-capi-ovn is expected to fail, I forgot about it.

@sadasu sadasu changed the title WIP: Updating CAPI and CAPZ versions CORS-3483: Updating CAPI and CAPZ versions Jun 18, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 18, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 18, 2024

@sadasu: This pull request references CORS-3483 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Updating CAPI to version v1.7.0 and CAPZ to v1.15.1-0.20240617212811-a52056dfb88c

The CAPZ version bump bring in kubernetes-sigs/cluster-api-provider-azure#4792

Co-authored-by: Aditya Narayanaswamy [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sadasu
Copy link
Contributor Author

sadasu commented Jun 18, 2024

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 18, 2024

@sadasu: This pull request references CORS-3483 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jhixson74
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from jhixson74 June 18, 2024 18:51
@r4f4
Copy link
Contributor

r4f4 commented Jun 18, 2024

e2e-aws-ovn, e2e-nutanix-capi-ovn, e2e-openstack-capi-ovn, e2e-vsphere-capi-ovn: cluster installed, so no problems with the capi bump.
Edit: e2e-gcp-capi-ovn: installed as well.
Edit2: e2e-powervs-capi-ovn: installed as well.

@r4f4
Copy link
Contributor

r4f4 commented Jun 18, 2024

e2e-azure-capi-ovn: I see the following repeating error line in the logs:

time="2024-06-18T18:52:42Z" level=debug msg="E0618 18:52:42.254498    8173 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""

It'd be good to double check if it's a concern.

@r4f4
Copy link
Contributor

r4f4 commented Jun 18, 2024

It's a LGTM if the changes are not negatively impacting azure.

@patrickdillon
Copy link
Contributor

patrickdillon commented Jun 18, 2024

looking at the alintfra-e2e-azure-capi-ovn job infrastructure is not becoming ready.

I would have expected this to fail at waiting for machines, although looking at the job history, some previous jobs were actually able to complete installs. So seems like something is off in this PR to fail at this stage.

@sadasu
Copy link
Contributor Author

sadasu commented Jun 18, 2024

looking at the alintfra-e2e-azure-capi-ovn job infrastructure is not becoming ready.

I would have expected this to fail at waiting for machines, although looking at the job history, some previous jobs were actually able to complete installs. So seems like something is off in this PR to fail at this stage.

Looking at the failure seen in this PR in isolation (not comparing it with previous failures), I see this:

level=info msg=Waiting up to 15m0s (until 6:33PM UTC) for network infrastructure to become ready...
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: infrastructure was not ready within 15m0s: client rate limiter Wait returned an error: context deadline exceeded
level=info msg=Shutting down local Cluster API control plane...

So, this error is generated here: kubernetes/client-go@147848c
Probable cause: during networking infrastructure provisioning, API calls to Azure are being rate limited and that is causing the 15min timer to run out

@jhixson74
Copy link
Member

e2e-azure-capi-ovn: I see the following repeating error line in the logs:

time="2024-06-18T18:52:42Z" level=debug msg="E0618 18:52:42.254498    8173 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""

It'd be good to double check if it's a concern.

Yup. This is definitely an issue. I spent some time debugging it but am not familiar enough with cluster-api to get to the bottom of it at this time. It looks like things aren't caching correctly or in time. I decided to try and pinpoint when the changes took place that broke this and determined that to be between v1.6.6 and v1.7.0. Unless there is a need to go to v1.7.0 right now, it would be nice to get this in at v1.6.6 and debug this later when we have more time ;-)

@r4f4
Copy link
Contributor

r4f4 commented Jun 19, 2024

e2e-azure-capi-ovn: I see the following repeating error line in the logs:

time="2024-06-18T18:52:42Z" level=debug msg="E0618 18:52:42.254498    8173 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""

It'd be good to double check if it's a concern.

Yup. This is definitely an issue. I spent some time debugging it but am not familiar enough with cluster-api to get to the bottom of it at this time. It looks like things aren't caching correctly or in time. I decided to try and pinpoint when the changes took place that broke this and determined that to be between v1.6.6 and v1.7.0. Unless there is a need to go to v1.7.0 right now, it would be nice to get this in at v1.6.6 and debug this later when we have more time ;-)

Looking at this azure run from this PR I don't see that error line even though it's using capi 1.7.0. I would expect the issue to be between 1.7.0 and 1.7.3 if it's a capi problem. It must be some capi + capz combination.

From what I can tell, the error happens when running the azure controllers, specifically the AzureMachinePool:

time="2024-06-13T15:11:39Z" level=debug msg="I0613 15:11:39.333862    9018 controller.go:186] \"Starting Controller\" controller=\"azuremachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePool\""
time="2024-06-13T15:11:39Z" level=debug msg="I0613 15:11:39.335979    9018 reflector.go:351] Caches populated for *v1.Secret from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105"
time="2024-06-13T15:11:39Z" level=debug msg="I0613 15:11:39.338057    9018 reflector.go:351] Caches populated for *v1beta1.Cluster from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105"
time="2024-06-13T15:11:39Z" level=debug msg="I0613 15:11:39.338558    9018 reflector.go:351] Caches populated for *v1beta1.MachinePool from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105"
time="2024-06-13T15:11:39Z" level=debug msg="E0613 15:11:39.338605    9018 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""

That controller only runs if the machinePool feature gate is true which was changed to true in capi v1.7.0.

The worrisome part about this log is:

time="2024-06-13T15:13:39Z" level=debug msg="E0613 15:13:39.336109    9018 kind.go:63] \"if kind is a CRD, it should be installed before calling Start\" err=\"no matches for kind \\\"KubeadmConfig\\\" in version \\\"bootstrap.cluster.x-k8s.io/v1beta1\\\"\" logger=\"controller-runtime.source.EventHandler\" kind=\"KubeadmConfig.bootstrap.cluster.x-k8s.io\""
time="2024-06-13T15:13:39Z" level=debug msg="E0613 15:13:39.437976    9018 controller.go:203] \"Could not wait for Cache to sync\" err=\"failed to wait for azuremachinepool caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.KubeadmConfig\" controller=\"azuremachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438049    9018 internal.go:516] \"Stopping and waiting for non leader election runnables\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438067    9018 internal.go:520] \"Stopping and waiting for leader election runnables\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438096    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438105    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azurecluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438112    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438121    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"ASOSecret\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438133    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438159    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremanagedmachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438170    9018 controller.go:242] \"All workers finished\" controller=\"azuremachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438174    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachinetemplate\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachineTemplate\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438184    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremanagedcontrolplane\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedControlPlane\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438195    9018 controller.go:242] \"All workers finished\" controller=\"azuremanagedcontrolplane\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedControlPlane\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438197    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremachinepoolmachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePoolMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438208    9018 controller.go:242] \"All workers finished\" controller=\"azuremachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438215    9018 controller.go:242] \"All workers finished\" controller=\"azuremachinepoolmachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachinePoolMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438216    9018 controller.go:240] \"Shutdown signal received, waiting for all workers to finish\" controller=\"azuremanagedcluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438224    9018 controller.go:242] \"All workers finished\" controller=\"azuremachine\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachine\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438226    9018 controller.go:242] \"All workers finished\" controller=\"azurecluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438230    9018 controller.go:242] \"All workers finished\" controller=\"azuremanagedcluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438232    9018 controller.go:242] \"All workers finished\" controller=\"azuremachinetemplate\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureMachineTemplate\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438245    9018 controller.go:242] \"All workers finished\" controller=\"azuremanagedmachinepool\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureManagedMachinePool\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438248    9018 controller.go:242] \"All workers finished\" controller=\"ASOSecret\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AzureCluster\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438285    9018 internal.go:528] \"Stopping and waiting for caches\""
time="2024-06-13T15:13:39Z" level=debug msg="W0613 15:13:39.438375    9018 reflector.go:462] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: watch of *v1api20220701.NatGateway ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding"
time="2024-06-13T15:13:39Z" level=debug msg="W0613 15:13:39.438501    9018 reflector.go:462] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: watch of *v1beta1.Machine ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding"
time="2024-06-13T15:13:39Z" level=debug msg="W0613 15:13:39.438527    9018 reflector.go:462] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: watch of *v1api20201101.VirtualNetworksSubnet ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding"
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438636    9018 internal.go:532] \"Stopping and waiting for webhooks\""
time="2024-06-13T15:13:39Z" level=debug msg="I0613 15:13:39.438682    9018 server.go:249] \"Shutting down webhook server with timeout of 1 minute\" logger=\"controller-runtime.webhook\""
time="2024-06-13T15:13:40Z" level=debug msg="I0613 15:13:40.493073    9018 internal.go:535] \"Stopping and waiting for HTTP servers\""
time="2024-06-13T15:13:40Z" level=debug msg="I0613 15:13:40.493121    9018 server.go:43] \"shutting down server\" kind=\"health probe\" addr=\"127.0.0.1:35841\""
time="2024-06-13T15:13:40Z" level=debug msg="I0613 15:13:40.493174    9018 internal.go:539] \"Wait completed, proceeding to shutdown the manager\""
time="2024-06-13T15:13:40Z" level=debug msg="E0613 15:13:40.493208    9018 main.go:353] \"problem running manager\" err=\"failed to wait for azuremachinepool caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.KubeadmConfig\" logger=\"setup\""
time="2024-06-13T15:26:49Z" level=debug msg="Collecting applied cluster api manifests..."

So azure controllers are shutting down (via a context cancel?) and nothing progresses for 13min until the Installer shuts envtest down at the 15min network infra timeout .

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2024
@patrickdillon
Copy link
Contributor

/test altinfra-e2e-azure-capi-ovn

@patrickdillon
Copy link
Contributor

I just tested this locally and it works!

DEBUG Time elapsed per stage:                      
DEBUG        Infrastructure Pre-provisioning: 1s   
DEBUG    Network-infrastructure Provisioning: 4m11s 
DEBUG Post-network, pre-machine Provisioning: 19m14s 
DEBUG        Bootstrap Ignition Provisioning: 1s   
DEBUG                   Machine Provisioning: 7m16s 
DEBUG       Infrastructure Post-provisioning: 23s  
DEBUG                     Bootstrap Complete: 15m16s 
DEBUG                                    API: 20s  
DEBUG                      Bootstrap Destroy: 56s  
DEBUG            Cluster Operators Available: 12m11s 
DEBUG               Cluster Operators Stable: 47s  
INFO Time elapsed: 1h0m30s        

Let's try to get this in as soon as possible, because it does not look fun to rebase this...

/approve

@patrickdillon
Copy link
Contributor

/retest-required

Copy link
Contributor

openshift-ci bot commented Jun 21, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 21, 2024
@patrickdillon
Copy link
Contributor

/test altinfra-e2e-gcp-capi-ovn altinfra-e2e-ibmcloud-capi-ovn altinfra-e2e-nutanix-capi-ovn altinfra-e2e-openstack-capi-ovn altinfra-e2e-powervs-capi-ovn altinfra-e2e-vsphere-capi-ovn

@patrickdillon
Copy link
Contributor

/test e2e-aws-ovn e2e-vsphere-ovn

@patrickdillon
Copy link
Contributor

�[36mINFO�[0m[2024-06-21T15:15:35Z] Running step e2e-azure-capi-ovn-ipi-install-install. 
�[36mINFO�[0m[2024-06-21T16:21:53Z] Step e2e-azure-capi-ovn-ipi-install-install succeeded after 1h6m18s. 

azure install has succeeded. just waiting for e2e tests.

Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 21, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 29fd804 and 2 for PR HEAD beab2da in total

@sadasu
Copy link
Contributor Author

sadasu commented Jun 21, 2024

/retest-required

@jhixson74
Copy link
Member

I just tested this locally and it works!

DEBUG Time elapsed per stage:                      
DEBUG        Infrastructure Pre-provisioning: 1s   
DEBUG    Network-infrastructure Provisioning: 4m11s 
DEBUG Post-network, pre-machine Provisioning: 19m14s 
DEBUG        Bootstrap Ignition Provisioning: 1s   
DEBUG                   Machine Provisioning: 7m16s 
DEBUG       Infrastructure Post-provisioning: 23s  
DEBUG                     Bootstrap Complete: 15m16s 
DEBUG                                    API: 20s  
DEBUG                      Bootstrap Destroy: 56s  
DEBUG            Cluster Operators Available: 12m11s 
DEBUG               Cluster Operators Stable: 47s  
INFO Time elapsed: 1h0m30s        

Let's try to get this in as soon as possible, because it does not look fun to rebase this...

/approve

Confirmed here as well.

@sadasu
Copy link
Contributor Author

sadasu commented Jun 21, 2024

/retest-required

@patrickdillon
Copy link
Contributor

/skip

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD bf24c3e and 1 for PR HEAD beab2da in total

Copy link
Contributor

openshift-ci bot commented Jun 22, 2024

@sadasu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-edge-zones d423690 link false /test e2e-aws-ovn-edge-zones
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones d423690 link false /test e2e-aws-ovn-shared-vpc-edge-zones
ci/prow/e2e-aws-ovn-shared-vpc-custom-security-groups d423690 link false /test e2e-aws-ovn-shared-vpc-custom-security-groups
ci/prow/e2e-aws-ovn-fips d423690 link false /test e2e-aws-ovn-fips
ci/prow/e2e-aws-ovn-single-node d423690 link false /test e2e-aws-ovn-single-node
ci/prow/altinfra-e2e-aws-ovn 50d82a6 link false /test altinfra-e2e-aws-ovn
ci/prow/e2e-aws-ovn-edge-zones-manifest-validation d423690 link true /test e2e-aws-ovn-edge-zones-manifest-validation
ci/prow/e2e-aws-ovn-imdsv2 d423690 link false /test e2e-aws-ovn-imdsv2
ci/prow/e2e-external-aws-ccm d423690 link false /test e2e-external-aws-ccm
ci/prow/altinfra-e2e-vsphere-capi-ovn beab2da link false /test altinfra-e2e-vsphere-capi-ovn
ci/prow/e2e-gcp-ovn-byo-vpc beab2da link false /test e2e-gcp-ovn-byo-vpc
ci/prow/e2e-gcp-ovn-xpn beab2da link false /test e2e-gcp-ovn-xpn
ci/prow/azure-ovn-marketplace-images beab2da link false /test azure-ovn-marketplace-images
ci/prow/e2e-gcp-secureboot beab2da link false /test e2e-gcp-secureboot
ci/prow/altinfra-e2e-ibmcloud-capi-ovn beab2da link false /test altinfra-e2e-ibmcloud-capi-ovn
ci/prow/altinfra-e2e-powervs-capi-ovn beab2da link false /test altinfra-e2e-powervs-capi-ovn
ci/prow/e2e-azure-ovn-shared-vpc beab2da link false /test e2e-azure-ovn-shared-vpc
ci/prow/e2e-azurestack beab2da link false /test e2e-azurestack

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 27d9113 into openshift:master Jun 22, 2024
33 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-installer-altinfra-container-v4.17.0-202406220812.p0.g27d9113.assembly.stream.el9 for distgit ose-installer-altinfra.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants