Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install Descheduler, fix startup readywait #4363

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andrewd-zededa
Copy link
Contributor

@andrewd-zededa andrewd-zededa commented Oct 16, 2024

This is a few changes to the cluster-init.sh install/boot path of HV=kubevirt eve as a base for upcoming cluster work.

Descheduler will be used for eve-app rebalancing during cluster node reboots/upgrades in an upcoming PR. After a node has encountered an outage and recovered the descheduler is used to evict pods where the current node does not match the preferred affinity node. Next the native kubernetes scheduler is allowed to run again and place that pod back where it has requested placement.

Longhorn daemonsets take some time to come ready (~5-10 minutes on some systems) after the initial install request with 'kubectl apply'. It is important to wait at install time and block all_components_initialized until all longhorn daemonsets are ready as a foundation before an upcoming PR to snapshot single-node /var/lib sqlite k3s db. This db snapshot is used to facilitate converting a cluster node back to a single node system.

Fix: Resolve a small window which led to a failure to import external-boot-image:

  • Wait for containerd before importing.
  • Tighter error checking on import.

pkg/kube/cluster-init.sh Outdated Show resolved Hide resolved
@andrewd-zededa
Copy link
Contributor Author

Rebased on master, addressed all review comments.

@andrewd-zededa
Copy link
Contributor Author

@deitch updated PR description to add context.

Copy link
Contributor

@deitch deitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions and possible suggestions.

pkg/kube/cluster-init.sh Outdated Show resolved Hide resolved
pkg/kube/cluster-init.sh Outdated Show resolved Hide resolved
pkg/kube/cluster-init.sh Outdated Show resolved Hide resolved
pkg/kube/cluster-init.sh Outdated Show resolved Hide resolved
pkg/kube/cluster-init.sh Outdated Show resolved Hide resolved
@andrewd-zededa andrewd-zededa force-pushed the andrewd-external-boot-image-import branch 7 times, most recently from dcf4165 to d5098b7 Compare October 23, 2024 14:53
Descheduler will be used for eve-app rebalancing during
cluster node reboots/upgrades in an upcoming PR.
Wait for longhorn daemonsets to be ready, before upcoming PR
to snapshot single-node /var/lib kube db.
Resolve sometimes failure to import external-boot-image
	Wait for containerd before importing.
	Tighter error checking on import.

Signed-off-by: Andrew Durbin <[email protected]>
@andrewd-zededa andrewd-zededa force-pushed the andrewd-external-boot-image-import branch from d5098b7 to 87f6a27 Compare October 23, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants