Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-2627 DNS Policy #2712

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

GEP-2627 DNS Policy #2712

wants to merge 3 commits into from

Conversation

maleck13
Copy link
Contributor

/kind gep

What this PR does / why we need it:
Adds a draft GEP outlining more on the what and why for supporting DNS configuration as part of Gateway API

Fixes #2627

@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 11, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @maleck13. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 11, 2024
@maleck13 maleck13 changed the title initial wip Draft for GEP-2627 initial Draft for GEP-2627 Jan 12, 2024
@maleck13 maleck13 changed the title initial Draft for GEP-2627 Initial draft for GEP-2627 Jan 12, 2024
Copy link
Contributor

@candita candita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some grammar fixes needed but generally looks good, could use some clarifications on the goals and use cases.

geps/gep-2627/metadata.yaml Outdated Show resolved Hide resolved
geps/gep-2627/index.md Outdated Show resolved Hide resolved
geps/gep-2627/index.md Outdated Show resolved Hide resolved
geps/gep-2627/index.md Outdated Show resolved Hide resolved
geps/gep-2627/index.md Outdated Show resolved Hide resolved
geps/gep-2627/index.md Outdated Show resolved Hide resolved
geps/gep-2627/index.md Outdated Show resolved Hide resolved

A a cluster administrator, I would like to have the dns names automatically populated into my specified dns zones as a set of records based on the assigned addresses of my gateways and have the status of the DNS records reported back to me, so that I do not have to undertake external automation or management of this essential task and can leverage existing kube based monitoring tools to know the status of the integration.

As a cluster administrator, I would like the DNS records to be setup automatically based on the assigned gateways address and if the IP or hostname changes, I would like for the DNS to update automatically to ensure traffic continues to reach my gateway.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to be configurable, not automatic. Maybe not all hostnames will be in the same IP subnet. I'm not clear on use cases where the IPaddr or hostname changes - can you give an example?

Copy link
Contributor Author

@maleck13 maleck13 Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to be configurable, not automatic. Maybe not all hostnames will be in the same IP subnet. by this do you mean if a gateway is assigned multiple addresses in its spec, one public and one private for example? So I guess we could allow the user to specify a range that was / should be populated and ignore others or he opposite, include all by default but allow an exclude list.

I'm not clear on use cases where the IPaddr or hostname changes - can you give an example? This is really to indicate that if, the assigned address in the Gateway is changed or new listeners are added, the DNS will be updated appropriately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As a cluster administrator, I would like the DNS records to be setup automatically based on the assigned gateways address and if the IP or hostname changes, I would like for the DNS to update automatically to ensure traffic continues to reach my gateway.
As a cluster administrator, I would like the DNS records to be updated automatically if the `spec` of assigned gateways changes, whether those changes are for IP address or hostname.

Is this a bit clearer?

geps/gep-2627/index.md Outdated Show resolved Hide resolved
geps/gep-2627/index.md Outdated Show resolved Hide resolved
@maleck13 maleck13 changed the title Initial draft for GEP-2627 GEP-2627 DNS Policy Mar 8, 2024
maleck13 and others added 3 commits March 8, 2024 12:42
minor tweaks

Update geps/gep-2627/index.md

Co-authored-by: Candace Holman <[email protected]>

Update geps/gep-2627/index.md

Co-authored-by: Candace Holman <[email protected]>

Update geps/gep-2627/index.md

Co-authored-by: Candace Holman <[email protected]>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: maleck13
Once this PR has been reviewed and has the lgtm label, please assign shaneutt for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@costinm
Copy link

costinm commented Apr 16, 2024

This needs a very serious security review. As a DNS admin - either public or private - I expect very strict controls on who can create DNS entries affecting the entire domain or VPC. It is certainly not something to enable in all clusters or namespaces - or trust that users have the security expertise to use this safely.

I think external-dns (alpha) CRD is the right approach for users who want to use KRM to manage DNS - the other options allowing random namespace owners to create arbitrary DNS (A and TXT) entries are a huge security risk.

I would be very strongly against this feature - DNS management is a very different thing from Gateway management, the user controlling DNS decides if a specific name (including ability to get ACME certs, etc) can be delegated to a particular Gateway in a particular cluster or not - which in turn allows specific routes (like .well-known/ for getting certs) to be sent to specific namespaces.

The argument that 'user just want an easy way to shoot themselves in the foot and break their own security - because they know what they're doing' is not valid for this particular case, since the implications are far broader than a single cluster.

@costinm
Copy link

costinm commented Apr 16, 2024

I should mention Istio - and any mesh based on interception, or L4-based security mechanisms - is completely dependent on DNS security - both the DNS resolver ( bob.com resolving to evil.com IP ) but also control over creation of entries in private or public DNS.

At the very least this option should not be allowed if any GAMMA implementation is used in the same VPC ( not only cluster).

@maleck13
Copy link
Contributor Author

@costinm Thanks for your response.

This needs a very serious security review. As a DNS admin - either public or private - I expect very strict controls on who can create DNS entries affecting the entire domain or VPC. It is certainly not something to enable in all clusters or namespaces - or trust that users have the security expertise to use this safely. 

I think external-dns (alpha) CRD is the right approach for users who want to use KRM to manage DNS - the other options allowing random namespace owners to create arbitrary DNS (A and TXT) entries are a huge security risk.

Not sure if it is clear, but the proposal and discussion was focused around a new API and as such would be controlled by regular RBAC controls just as creating a Gateway would be or creating an external dns resource would be. So there should be no issue controlling who is allowed to create such an object and in turn control the DNS for a given Gateway's listeners.

@costinm
Copy link

costinm commented May 14, 2024

RBAC is per cluster. Are you saying one zone per cluster ? It also doesn't look inside the resource - how do we limit what names a namespace owner can set ?

A separate CRD for DNS with RBAC is the minimum - but is it in scope for Gateway ( instead of core networking and dns for k8s) ?

@costinm
Copy link

costinm commented May 14, 2024

Btw - the CRD defined by external-dns project seems like a good start and has support for most DNS servers. Any reason users can't just use that ?

@youngnick
Copy link
Contributor

@costinm, this is an initial GEP that lays down the scope and terms to start the discussion of if, and if so, how, we can represent that Gateways should have external DNS configured in a standard, portable way. I agree that there are significant security considerations, and that those security considerations should be included in the GEP.

If you have specific feedback about extra use cases or other concerns that you would like addressed, the correct way to indicate them is via comments on the file, rather than broad generalizations discouraging further work on this.

Once this PR merges and we have a basis for discussion, I'd love for you to shepherd some more changes to it to ensure that your concerns are captured.

@costinm
Copy link

costinm commented May 23, 2024 via email

@costinm
Copy link

costinm commented May 23, 2024 via email

@youngnick
Copy link
Contributor

This work is to provide a standardized way to request DNS records are created for Gateway instances - as you said earlier, basically the same as what external-dns does today. The idea is to see if we can find some common ground and provide a relatively standardized way to communicate this config, that can cover common use cases across multiple providers or implementations.

Some implementations (external-dns for one) are already looking at the Gateway object to provision DNS config, the intent here is to ensure that relevant information has a structured, extensible, portable format that it can be stored in.

The process for GEPs is documented at https://gateway-api.sigs.k8s.io/geps/overview/, and as this is the first stage (moving from discussion into Provisional), the bar is intended to be pretty low. It's intended that we use this chance to agree:

  • that the problem is one we should consider
  • what terms we will use to describe the problem
  • what use cases we are considering

There can be multiple PRs while a GEP is in Provisional, but the intent is to provide a place that is more durable than Github comments to raise concerns, answer questions, and so on.

The best type of feedback here is feedback on the specific file, yes, but I think that questions about if the problem is in scope for Gateway API should be addressed here, so that we can all be on the same page.

I believe the problem of ensuring that there is a reliable way for Gateway owners to request provisioning of DNS records to cover access to the "outside" of the Gateway is in scope for Gateway API. If we agree on that, then the discussion becomes about:

  • what config is needed for the given use cases
  • where that config should live

It could be that minimal extra configuration is needed, but from the fact that external-dns and other projects already have Policy objects to cover this, it seems likely that we will need additional structured configuration of some kind. Additionally, if there are at least three implementations of the same feature, then that is the level at which we will consider doing something not provider-specific, to leave room for other implementations to do the same task in a compatible, portable way.

@costinm
Copy link

costinm commented May 24, 2024 via email

@youngnick
Copy link
Contributor

@costinm, my responses below.

As I mentioned in my previous comment - I believe this is out of scope for the Gateway WG - and perfectly in scope for the 'external DNS WG' - which is focused on representing multiple K8S resources in DNS - as well as the 'Core API WG', which define the core DNS. It looks like the 'external DNS' WG also believes (implicitly) it is in their scope - given that they already implemented support for representing gateway resources among many others.

I disagree. People using this API need to have an answer for how they ask some DNS management system to consider their Gateway in scope for management. Right now, in external-dns, the support only looks at Route resources, which misses the whole model we have for having hostnames match between Gateway and Route. I can see you logged kubernetes-sigs/external-dns#4402 to request adding support for Gateways to the API, which I agree should definitely be supported.

The reason here is that we (Gateway API) have not documented anywhere how consumers of the DNS information already present in the API should use that information. We don't have anything that says "Implementations wanting to do automatic configuration must consider both Gateway and Routes in generating information". It's implied by the spec (since you can't get an address without looking at the Gateway), but we haven't actually said that anywhere.

@maleck13 and the other folks who are working on Kuadrant have also already built a solution for managing the same config, in a different way to the external-dns team. So, some discussion here is required, even if the outcome is that we end up picking one of those implementations as the canonical one. (I think that's unlikely.)

Nobody is suggesting that Gateway API own end-to-end DNS provisioning here. But, as a traffic routing specification, we interact with DNS, and unless we define how that happens, and explicitly rule things out of scope, then we will end up owning it by default. I don't want that, and it seems that you don't either. But that requries us defining what is in scope.

The Gateway WG does have a larger security problem around ownership for hostnames - it does solve some of the 'confused deputy' issues and has some basic delegation/grant model, however the 'persona' who owns the DNS zone and domain ( and indirectly all associated security - ability to get certs for subdomains, etc) is different from the persona configuring a Gateway in a namespace.

Yes, and this is one of the many reasons why we haven't addressed this at all up until now.

Again, this provisional GEP is not suggesting that we solve this problem. In fact, I believe that this should be the responsibility of the implementation that is actuating the creation of DNS records. If a Gateway is created somewhere, and that person shouldn't be creating hostnames there, that's up to the thing that handles DNS creation, not the Gateway API implementation.

Again, this GEP is about defining how other controllers that maybe don't implement routing configuration can manage DNS config sourced from Gateway API resources. Doing this does not mean that north-south Gateway implementations need to support it. Sure, they can if they want to, but the idea here is to ensure that everyone doing it (which, again, multiple implementations already are) does it in the same way.

DNS may span multiple clusters and non-k8s environments - and operates on a top-down model, and needs a consistent security and delegation model across all record types and uses. How to configure DNS (securely) is clearly a very important problem - but best handled in a WG focused on DNS, not in 'leaf' APIs that interact with a single record and small subset - and without a security/IAM model matching DNS needs.

This is not yet a proposal, to be clear. It's an attempt to start talking about what a proposal will involve. The fact that certain questions have been asked and others haven't is exactly what this process is supposed to achieve.

Also, part of the job of a Gateway is to map addresses to routing config. Part of the routing config is the hostname, so it's reasonable natural to assume that people may want to mark hostnames on a Gateway as "please configure these automatically".

I agree that it shouldn't be the job of Gateway API to decide how the requests to map an address to a hostname should be actuated. But it's entirely within our scope to define something that says "use these mappings and not these ones." You're asking us to boil the ocean of solving all DNS management problems, of course that's out of scope. But marking particular mappings is well inside the scope of Gateway API.

The proposal doesn't mention any 'prior art' or existing LB implementation that programs DNS - Istio does handle DNS interception but only for egress use cases. The permissions needed for an in-cluster load balancer to program a customer DNS zone - and the complexity on integrating with the many DNS APIs ( see the list of plugins in external DNS and ACME programs ) makes it unlikely that even 2-3 gateway implementations will be able to do that - while external-dns has a working implementation that can be used with any gateway ( and vendors can provide specialized ones as well - most of the code is related to the 100 DNS APIs).

Yes, because it is at the stage of gathering questions, not proposing solutions.

In general: any GEP that is not based on existing art and features that a significant number of gateways supports ( I would go for 'majority'...) will lead to fragmentation and user pain with the proliferation of optional features - as well as poor APIs that are not based on well established and broadly implemented designs (and I would say in Istio we have a bit of experience with that...)

I don't understand what saying this is intended to achieve. Yes, the point of the GEP process is to ensure that the features we include can be well supported by multiple implementations. Specifications that we define don't have to be implemented by every implementation, and I think External DNS is a great example of functionality that will probably be mostly implemented by a different set of controllers than the ones that provide core routing functions.

These functions may be optional in the spec, but if we don't define them, each implementation will define their own, probably using annotations, and we will be back where we were with Ingress all over again.

I'm about to go and add some suggested changes to capture some of what we've been discussing here. If you wish to discuss this further, I think that we should move it to the community meeting or a separate dedicated meeting to get some higher-bandwidth communication.

Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some suggestions based on my discussion with @costinm in the main comments of the PR. Hopefully these capture the flavor of our discussion.

Comment on lines +10 to +18
## Goals
* Allow cluster operators to declaratively express which DNS service they want to use with a particular Gateway or Gateway Listener.
* Provide a mechanism to allow the DNS configuration to be delegated to a chosen controller.
* Provide a standard CRD-based API with expressive status reporting and remove the need for "loose" APIs such as annotations.
* Increase portability and supportability between Gateway API implementations and third party controllers offering DNS integration.

## Non-Goals

* Cover more complex DNS routing strategies that come into play for multi-cluster topologies such as round robin, failover, health checks, weighted and geo location with this first pass. Supporting these types of use cases for distributed gateways (e.g., in different regions or multiple gateways for resilience within a region) and offering a form of global load balancing leveraging DNS is a potential future goal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion with @costinm, I suggest the following changes to make our initial scope a bit more clear:

Suggested change
## Goals
* Allow cluster operators to declaratively express which DNS service they want to use with a particular Gateway or Gateway Listener.
* Provide a mechanism to allow the DNS configuration to be delegated to a chosen controller.
* Provide a standard CRD-based API with expressive status reporting and remove the need for "loose" APIs such as annotations.
* Increase portability and supportability between Gateway API implementations and third party controllers offering DNS integration.
## Non-Goals
* Cover more complex DNS routing strategies that come into play for multi-cluster topologies such as round robin, failover, health checks, weighted and geo location with this first pass. Supporting these types of use cases for distributed gateways (e.g., in different regions or multiple gateways for resilience within a region) and offering a form of global load balancing leveraging DNS is a potential future goal.
## Goals
* Provide a way for Gateway API resource owners to mark their resources as relevant for external DNS provisioning
* Ensure that the above method has a way for multiple providers to be present in the cluster and be able to actuate external DNS provisioning requests
* Ensure that any method is based on structured fields and makes the most of `status` on whatever resources are relevant, whether they are existing Gateway API resources or new resources.
* Increase portability and supportability between Gateway API implementations and third party controllers offering DNS integration.
* Clarity on the scope of hostnames under management. (This should _not_ be able to be used to affect the standard in-cluster DNS configuration)
## Non-Goals
* Anything to do with configuring in-cluster DNS. This support is for configuration outside the cluster only.
* Ways to define if the Gateway API resources are allowed to request particular hostnames. These choices should be left to the implementations that actually actuate the requests for hostnames. However, `status` flows should be specified so as to make clear if a hostname provisioning request cannot be performed.
* Cover more complex DNS routing strategies that come into play for multi-cluster topologies such as round robin, failover, health checks, weighted and geo location with this first pass. Supporting these types of use cases for distributed gateways (e.g., in different regions or multiple gateways for resilience within a region) and offering a form of global load balancing leveraging DNS is a potential future goal.


## Use Cases

As a cluster administrator, I manage a set of domains and a set of gateways. I would like to declaratively define which DNS provider to use to configure connectivity for clients accessing these domains and my gateway so that I can see and configure which DNS provider is being used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As a cluster administrator, I manage a set of domains and a set of gateways. I would like to declaratively define which DNS provider to use to configure connectivity for clients accessing these domains and my gateway so that I can see and configure which DNS provider is being used.
As a cluster administrator, I manage a set of domains and a set of gateways. I would like to declaratively define which gateways should be used for provisioning DNS records, and, if necessary, which DNS provider to use to configure connectivity for clients accessing these domains and my gateway so that I can see and configure which DNS provider is being used.


A a cluster administrator, I would like to have the dns names automatically populated into my specified dns zones as a set of records based on the assigned addresses of my gateways and have the status of the DNS records reported back to me, so that I do not have to undertake external automation or management of this essential task and can leverage existing kube based monitoring tools to know the status of the integration.

As a cluster administrator, I would like the DNS records to be setup automatically based on the assigned gateways address and if the IP or hostname changes, I would like for the DNS to update automatically to ensure traffic continues to reach my gateway.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As a cluster administrator, I would like the DNS records to be setup automatically based on the assigned gateways address and if the IP or hostname changes, I would like for the DNS to update automatically to ensure traffic continues to reach my gateway.
As a cluster administrator, I would like the DNS records to be updated automatically if the `spec` of assigned gateways changes, whether those changes are for IP address or hostname.

Is this a bit clearer?

As a cluster administrator I would have the status of the DNS records reported back to me, so that I can leverage existing kube based monitoring tools to know the status of the integration.

As a cluster administrator, I would like the DNS records to be setup automatically based on the assigned gateways address and if the IP or hostname changes, I would like for the DNS to update automatically to ensure traffic continues to reach my gateway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As a DNS administrator, I should be able to ensure that only approved External DNS controllers can make changes to DNS zone configuration. (This should in general be taken care of by DNS system <-> External DNS controller interactions like user credentials and operation status responses, but it is important to remember that it needs to happen).

Suggestion to capture some of @costinm's feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/gep PRs related to Gateway Enhancement Proposal(GEP) needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GEP: DNS configuration as part of Gateway API
5 participants