-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opt-in allowing cascading deletes of namespaces #119
Comments
The easiest way to implement this would be to configure an Accurate installation to allow cascading deletes. It would be better if it could be expressed in some declarative way per namespace/sub-namespace, but I fear the complexities in an implementation of this. Looking at the docs for HNC in this area, it's not clear to me how this works - even after reading it multiple times. 😄 My ideal solution would be to allow cascading deletes of namespaces in general, which is admin area anyway. And block end-users from deleting |
I don't think it is a good idea to cascade deletions by default. Because it is breaking changes and very dangerous. |
@yamatcha I agree allowing cascade deletes by default could represent a risk for some users. But I would not consider it a breaking change: Anything that worked before will still work. That said, you rule in this project, and adding this enhancement as an opt-in is fine with me. 😸 Our most common problem with the present validation is that removing a tenant from our clusters requires manual intervention. We promote GitOps as much as possible, and our tenants are onboarded with a pull request to our tenants-repo. When PR is merged, our GitOps agent will create an Accurate root namespace for the tenant and bootstrap the tenant's GitOps agent with default user-facing RBAC in their newly created root namespace. This will allow the tenant to self-provision anything granted by RBAC in their root namespace, which also includes creating sub-namespaces. And if the tenant decides to create one or more sub-namespaces, our GitOps agent suddenly loses the ability to delete the tenant's root namespace. This is not right IMO. If our desired state (in Git) specifies that the tenant root namespace should cease to exist, our GitOps agent should be able to delete it! Right now, we have to clean this up manually, which is something we would like to avoid. I suggest starting with a controller config allowing the namespace DELETE validating webhook to be disabled. It could make sense to optionally disable the subnamespaces DELETE validating webhook also, but let's leave that for an eventual follow-up PR. Allowing us, as responsible for our clusters, the ability to manage namespaces is the most urgent issue. Note: I want to allow any user with RBAC present to manage namespaces freely, and not only "the cluster administrator". This is because we have least-privileged agents configured with elevated privileges, but not WDYT? Is there a chance I can get something as described eventually merged? |
What
As a cluster-admin I want to onboard and eventually terminate tenants in our cluster by creating and eventually deleting a root namespace for the tenant. The onboarding process works like a charm, but I am not too happy about the termination process - which we had an opportunity to try out this week. Let me explain:
We use a GitOps process where tenants are onboarded via a pull request to our "tenants" project. A tenant is in this project defined by some simple resources:
tenant-template
Accurate template namespace.So far, so good. But this week we received a request for a tenant termination. Using a modern GitOps tool like Flux, with pruning enabled, we thought that it was as simple as reverting the onboarding PR in Git. So we did that, after getting the PR approved by the tenant responsible. What we forgot to do, was to check if there were sub-namespaces defined under the tenant root namespace. After merging the tenant termination PR, Flux immediately reported an error: It got (correctly) blocked by the Accurate namespace webhook:
But this error was reported only once, which kind of surprised me - as Flux is usually constantly reconciling until the actual state equals the desired state. And as I suspected, the tenant namespace was still present - including the child namespaces that we did not think about. However the Flux controller resource had no knowledge of the resources it used to control anymore, which left the tenant root namespace (including children) as orphans in our cluster. This is something we are trying hard to avoid.
After cleaning up manually, I reached out to the Flux maintainers on Slack, and you can read all the details in the CNCF Slack thread - if you are interested. TL;DR: Flux maintainers think this is an issue with Accurate, and I tend to agree now.
To fix this for future tenant terminations in our clusters, I suggest adding an opt-in allowing us to configure the Accurate namespace webhook allowing cascade namespace
DELETE
requests.How
I have some ideas after looking at the code, but please let me know what you think first! I will update this description once we agree on a solution. I'll be happy to submit a PR fixing this.
Checklist
The text was updated successfully, but these errors were encountered: