Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerances and Affinity addition to Helm Chart #1712

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

JonTheNiceGuy
Copy link

SUMMARY

This change introduces the ability to define tolerances and affinity to the AWX Operator.

The cluster on which I'm working is highly defined with taints to ensure workload placement, and the current helm chart does not allow it to be deployed. While adding support for tolerances, the other common placement option - affinity - was added at the same time.

ISSUE TYPE
  • New or Enhanced Feature
ADDITIONAL INFORMATION

When the current helm chart is deployed, on a cluster where all nodes have a taint (e.g. xl:NoSchedule=true or clusterservices:NoSchedule=true) the operator will go into a permanent "pending" state and in the events it will show:

  Type     Reason             Age                     From                Message
  ----     ------             ----                    ----                -------
  Warning  FailedScheduling   64s (x336 over 28h)     default-scheduler   0/6 nodes are available: 3 node(s) had untolerated taint {xl: true}, 3 node(s) had untolerated taint {clusterservices: true}. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..

Adding toleration to the helm chart and implementing it will allow it to be scheduled.

I was unable to test this change with molecule as I don't have molecule installed or configured and the documentation for the operator points to an out-of-date link for molecule install guides, and the guides on ansible.com for molecule show an error about a driver not being configured. I did test the resulting helm chart on my cluster to show the tolerations were applied. I am happy to work with developers and contributors if any additional testing is required.

@@ -375,6 +375,8 @@ helm-chart-generate: kustomize helm kubectl-slice yq charts
for file in $${cluster_scoped_files}; do\
$(YQ) -i '.metadata.name += "-{{ .Release.Name }}"' $${file};\
done
# Add tolerations and affinity to the deployment
$(SED_I) -E -e 's/^( containers:)/{{- with .Values.tolerations }}\n tolerations:\n{{- toYaml . | nindent 8 }}\n{{- end }}\n{{- with .Values.affinity }}\n affinity:\n{{- toYaml . | nindent 8 }}\n{{- end }}\n\1/' charts/$(CHART_NAME)/raw-files/deployment-awx-operator-controller-manager.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this file in my checkout of the awx-operator..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is created by this command in the Makefile. The source of that file is config/manager/manager.yml but it split apart by kustomize.

@rooftopcellist
Copy link
Member

@JonTheNiceGuy if you need to add tolerations and pod_affinity to the awx application deployments, is there a reason why you wouldn't just do so on the AWX.spec?

Docs here: https://github.com/ansible/awx-operator/tree/devel/.helm/starter#awx


But it seems like you are trying to add them to the awx-operator manager pod itself. The typically way to accomplish this would be with kustomize. But if you want to do it with helm, I think you would want to follow the pattern used here:

Rather than the sed approach in this PR. I am not as familiar with helm myself. So I may just be misunderstanding. Could you explain your approach a bit more if so?

@JonTheNiceGuy
Copy link
Author

Thanks for taking the time to look at my PR @rooftopcellist. I'll try to answer as best I can (I'm also not 100% on helm, but I'm doing my best!)

if you need to add tolerations and pod_affinity to the awx application deployments, is there a reason why you wouldn't just do so on the AWX.spec?

Because the AWX.spec needs to be operated on by something, which is (as far as I can make out from my tests) is actually another pod called controller-manager. If you don't also have tolerations on that pod, and all your nodes are tainted, it can't ever start, which means the AWX spec isn't actioned into your cluster.

The typically way to accomplish this would be with kustomize.

I would agree, except I needed to inject the helm-specific loop {{- with .Values.tolerations }}\ntolerations:\n{{- toYaml . | nindent 8 }}\n{{- end }} that I couldn't determine would work with Kustomize. Given that $(SED_I) is already defined in the Makefile, I reasoned that it's probably in use somewhere else, so why not use it. That said, if a kustomize guru can fix my assumption, then I'm more than happy to swap out my hack with a better one 😄

But if you want to do it with helm, I think you would want to follow the pattern used here: https://github.com/ansible/awx-operator/pull/1690/files

See my previous comment about the AWX.spec.

You also mentioned the file in question not appearing in the check-out; this is because the Makefile actually uses kustomize to split the manifest files into the parts needed for the helm chart.

I hope this helps. Please do let me know if you need any more clarification on my approach 😄

* Uses sed rather than yq because I couldn't make yq put the stanzas in the right place
* Add entries to the Values file
@JonTheNiceGuy JonTheNiceGuy force-pushed the add_tolerations_and_affinity_to_the_operator branch from ec8c20f to b7aa6a7 Compare March 25, 2024 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants