Improve Conditions and Terminal errors #2379

EmilienM · 2025-01-20T23:48:52Z

/kind feature

Context and Background

As part of the initiative to improve status reporting in Cluster API (CAPI) resources, significant changes will be introduced to how resource statuses are handled in the Cluster API Provider for OpenStack (CAPO).

One major change involves phasing out the FailureReason and FailureMessage fields in favor of leveraging Kubernetes Conditions to encapsulate terminal failures and lifecycle statuses. Terminal failures, though unique to CAPI, can be effectively communicated through well-defined conditions, using explicit type and reason values to represent fatal issues. This shift aligns CAPO with Kubernetes conventions and ensures that error states are consistently and clearly conveyed.

Key Updates and Behavior Changes

Handling Terminal Failures with Conditions

Terminal failures will now be represented as conditions, providing clear, human-readable messages and actionable reasons.
CAPO controllers will interpret these conditions to stop reconciling objects in a terminal state.
If the failure is transient, users can manually clear the fatal condition to allow reconciliation to restart.

Lifecycle Management via Conditions

Non-Recoverable Conditions: Objects with fatal conditions (e.g., unrecoverable infrastructure issues) will no longer be reconciled.
Temporary Conditions: Objects with transient issues will continue reconciliation until either resolved or escalated to a fatal state.

Immutable vs. Mutable Resource Behavior

Immutable Resources (OpenStackMachine, OpenStackServer): Once fully ready, these resources become immutable. Manual changes to the underlying OpenStack resources will not update their status in CAPO (e.g., replacing a server port).
Mutable Resources (OpenStackCluster): These resources may experience condition changes, reflecting updates or failures after modification (e.g., issues arising from adding a security group while the Neutron API is unresponsive).

Known Issues and Areas for Improvement

Several existing issues highlight gaps in handling terminal failures or reflect inconsistent status behavior. This enhancement will address the following key issues:

Issue #2146: Terminal failures are either not identified or incorrectly reported.
Issue #2185: Missing conditions in critical resource workflows.
Issue #2264: Inconsistent handling of fatal errors in OpenStackMachine.
Issue #2265: Status fields are not aligned with the proposed lifecycle management.

Summary

By aligning CAPO with CAPI’s improved status reporting and transitioning to a condition-driven model, this enhancement will:

Provide clearer, more actionable resource statuses.
Reduce ambiguity in handling terminal failures.
Improve lifecycle management for immutable and mutable resources.
Address existing gaps and inconsistencies in error reporting.

The text was updated successfully, but these errors were encountered:

github-project-automation bot added this to CAPO Roadmap Jan 20, 2025

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 20, 2025

github-project-automation bot moved this to Inbox in CAPO Roadmap Jan 20, 2025

EmilienM changed the title ~~OpenStackCluster: improve Conditions~~ OpenStackCluster: improve Conditions and Terminal errors Jan 21, 2025

EmilienM changed the title ~~OpenStackCluster: improve Conditions and Terminal errors~~ Improve Conditions and Terminal errors Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Conditions and Terminal errors #2379

Improve Conditions and Terminal errors #2379

EmilienM commented Jan 20, 2025 •

edited

Loading

Improve Conditions and Terminal errors #2379

Improve Conditions and Terminal errors #2379

Comments

EmilienM commented Jan 20, 2025 • edited Loading

Context and Background

Key Updates and Behavior Changes

Known Issues and Areas for Improvement

Summary

EmilienM commented Jan 20, 2025 •

edited

Loading