Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for improving resources status in CAPO #2290

Open
EmilienM opened this issue Nov 27, 2024 · 0 comments
Open

Tracking issue for improving resources status in CAPO #2290

EmilienM opened this issue Nov 27, 2024 · 0 comments
Assignees
Milestone

Comments

@EmilienM
Copy link
Contributor

EmilienM commented Nov 27, 2024

This is a tracking issue for CAPO-related effort to improve resources status.

High level required changes with the new CAPI contract

Most of these changes will be required in the v1beta2 API contract (tentative Apr 2025).

OpenStackCluster

Following changes are planned for the contract for the OpenStackCluster resource:

  • Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow
    • Rename status.ready into status.initialization.provisioned.
  • Remove failureReason and failureMessage.

Notes:

  • OpenStackCluster's status.initialization.provisioned will surface into Cluster's status.initialization.infrastructureProvisioned field.
  • OpenStackCluster's status.initialization.provisioned must signal the completion of the initial provisioning of the cluster infrastructure. The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it.
  • OpenStackCluster's status.conditions[Ready] will surface into Machine's status.conditions[InfrastructureReady] condition.
  • OpenStackCluster's status.conditions[Ready] must surface issues during the entire lifecycle of the OpenStackCluster (both during initial OpenStackCluster provisioning and after the initial provisioning is completed).

OpenStackMachine

Following changes are planned for the contract for the OpenStackMachine resource:

  • Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow
    • Rename status.ready into status.initialization.provisioned.
  • Remove failureReason and failureMessage.

Notes:

  • OpenStackMachine's status.initialization.provisioned will surface into Machine's status.initialization.infrastructureProvisioned field.
  • OpenStackMachine's status.initialization.provisioned must signal the completion of the initial provisioning of the cluster infrastructure. The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it.
  • OpenStackMachine's status.conditions[Ready] will surface into Cluster's status.conditions[InfrastructureReady] condition.
  • OpenStackMachine's status.conditions[Ready] must surface issues during the entire lifecycle of the Machine (both during initial OpenStackMachine provisioning and after the initial provisioning is completed).

Notes on Conditions

Some remarks about Kubernetes API conventions in regard to conditions:

  • Polarity: Condition type names should make sense for humans; neither positive nor negative polarity can be recommended
    as a general rule
  • Use of the Reason field is required (currently in Cluster API reasons is added only when condition are false)
  • Controllers should apply their conditions to a resource the first time they visit the resource, even if the status is Unknown.
    (currently Cluster API controllers add conditions at different stages of the reconcile loops). Please note that:
    • If more than one controller adds conditions to the same resources, conditions managed by the different controllers will be
      applied at different times.
    • Kubernetes API conventions account for exceptions to this rule; for known conditions, the absence of a condition status should
      be interpreted the same as Unknown, and typically indicates that reconciliation has not yet finished.
  • We'll be using metav1.Conditions from the Kubernetes API.

Terminal Failures

By getting rid of the terminal failures, we have an opportunity to improve CAPO's reliability to handle OpenStack infrastructure failures, such as API rate limits or temporary unavailability which unfortunately happen often in large-scale production clouds.
We'll need to investigate what these failures can be, and how we threat them:

  • CAPO continues to reconcile the resource and update conditions with a temporary state
  • CAPO stops reconciling the resource and update conditions to an human readable error message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

No branches or pull requests

3 participants