Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine desired and correct "service" discovery mode behavior #5269

Open
4 tasks
rainest opened this issue Dec 2, 2023 · 6 comments
Open
4 tasks

Determine desired and correct "service" discovery mode behavior #5269

rainest opened this issue Dec 2, 2023 · 6 comments

Comments

@rainest
Copy link
Contributor

rainest commented Dec 2, 2023

The hostname scheme used for the service gateway discovery mode appears incorrect. The Service DNS specification uses a single hostname that points to the Service IP, or Pod IPs in headless mode. KIC 3.0 uses multiple hostnames, prepending a Pod IP subdomain to the Service hostname.

While the Pod hostname spec does use IP subdomains, the Service spec does not.

The original goal of this feature was to support TLS, with the (mostly correct) expectation that users would not have certificates with Pod IP SANs available. We should clarify which hostnames we expect in practice and why. While we can reliably expect that the short Service name (without the .cluster.local or whatever cluster domain suffix) will resolve, users with certificate issuance systems will possibly use arbitrary hostnames.

The currently working mode (pod) expects proxy instances to present a certificate matching the Pod hostname. Because each replica must mount the same certificate, it likely requires a wildcard SAN to match any IP subdomain. This is not ideal: Pod hostnames use a <IP>.<Namespace>.pod.cluster.local format, so a valid certificate for any proxy Pod would also be valid for any other Pod in the same namespace.

Presenting a certificate valid for the full Service hostname (no wildcard) is not overly broad, but does not really fit with the current discovery implementation. There are no Service-based hostnames that point to a specific endpoint only, so there's no way we can construct a list of hostnames that covers every replica. We would likely need to build the list of IPs to contact (as in the ip strategy), loop over those, and dynamically build a client with local DNS overrides for each, so that we can remap <service>.<namespace>.svc.cluster.local hostnames to a each endpoint IP in order. This also satisfies the arbitrary hostname case.

Rough acceptance:

  • The service discovery mode uses the hostname format from https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#a-aaaa-records when connecting to Pods.
  • Users can provide their own hostname for admin service TLS validation.
  • The controller can still update every Pod in DB-less mode, even when given only a single hostname.
  • (Optional, recommended) We can properly test discovery in integration testing, overcoming the hurdle of not running in the cluster but wanting to use cluster-internal IPs. Suggested, though unexplored options for bridging that gap include Inlets, kt-connect, and Telepresence.

Dunno really how best to phrase those, but whatever.

@rainest
Copy link
Contributor Author

rainest commented Dec 2, 2023

FWIW, I initially wrote this as a response to #3934 before determining it needed a new issue. The original writeup had some examples that are omitted above.


pod does resolve fine on GKE with kube-dns:

$ kubectl version
Client Version: v1.28.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.24.15-gke.1700

$ kubectl get po -owide
NAME                                   READY   STATUS    RESTARTS   AGE     IP           NODE                                                  NOMINATED NODE   READINESS GATES
ana-kong-6ccbdcb7cc-6dxhp              1/1     Running   0          5m59s   10.104.0.6   gke-traines-test-2023-12-default-pool-cb07533f-cxw4   <none>           <none>
ana-kong-controller-655794c769-g5rc7   1/1     Running   0          5m58s   10.104.0.7   gke-traines-test-2023-12-default-pool-cb07533f-cxw4   <none>           <none>

$ kubectl get po -oyaml | grep -A1 DNS_STRATEGY
      - name: CONTROLLER_GATEWAY_DISCOVERY_DNS_STRATEGY
        value: pod

2023-12-01T22:15:07Z	info	setup	Retrying kong admin api client call after error	{"v": 0, "retries": "2/60", "error": "making HTTP request: Get \"https://10-104-0-6.default.pod:8444/\": dial tcp 10.104.0.6:8444: connect: connection refused"}
2023-12-01T22:15:11Z	info	Successfully synced configuration to Kong	{"url": "https://10-104-0-6.default.pod:8444", "update_strategy": "InMemory", "v": 0}

Though not shown, adding actual configuration, scaling/restarting, and non-default namespaces all worked fine.

The service strategy does not work:

2023-12-01T22:31:06Z	info	setup	Retrying kong admin api client call after error	{"v": 0, "retries": "24/60", "error": "making HTTP request: Get \"https://10-104-2-7.ana-kong-admin.other.svc:8444/\": dial tcp: lookup 10-104-2-7.ana-kong-admin.other.svc on 10.108.0.10:53: no such host"}

This address format looks unusual, however. While the Pod DNS spec does indicate that it includes the IP prefix, the Service DNS spec does not, and instead says that you should only see a single NAME.NAMESPACE.svc.cluster.local A record, which returns the ClusterIP if available or the Pod IPs if not (if headless). GKE DNS does return those:

root@cool:/# dig kana-kong-admin.default.svc.cluster.local +short
10.104.0.16
10.104.1.12
10.104.2.10

The format with IP subdomains is of unclear origin. Is it something extra CoreDNS provides? This format is still in use, but I'm no longer sure where it should work. Although we mentioned CoreDNS, service does not work with CoreDNS (on KIND) either, so this address format does not appear to be something extra it offers out of the box.

@rainest
Copy link
Contributor Author

rainest commented Dec 7, 2023

Review of the tunneling options

General notes

  • The integration test suite hardcodes the use of discovery or not discovery. We may want to make this switchable, though it's at the cost of yet another integration run if we do use it.
  • KTF apparently hardcodes the use of an HTTP admin API. It should not do this.

#5295 has a mostly working PoC. Something weird is broken with Postgres and it hacks away the problems above.

Inlets

The OSS options and their feature set isn't quite clear. It seems that the server and management app have MIT licensed options, but the client has only paid options? Also not clear if it offers a simple network device and route manipulation option for transparent L3 routing, which is what we want. Docs suggest it requires a TCP proxy with a listen on the client host, so you would have to connect to that instead of Pod IPs.

kt-connect

Very not product-ized and likely to remain that way given Alibaba Cloud's market. However, it is GPLv3. This shouldn't matter as it's being used to manipulate the environment and would not ever be imported into Go packages, but that may still create an extra wrinkle.

Somewhat hampered by basically all issue discussion being in Chinese.

Telepresence

Somewhat product-ized. The original feature set (what we need) was available via a standalone application, but Ambassador have recently made the readily available binaries require an account regardless of whether you'd need one to push paid features.

No-account installs

The binaries from the GitHub releases page do not have this limitation, but are not distributed via package managers (namely Homebrew) or Actions (they're hardcoded to the nag version).

The actions fortunately aren't necessary (simply downloading the binary and running telepresence helm install and telepresence connect works fine), but local dev is kinda limited. I thought go install would work, but

go install github.com/telepresenceio/telepresence/v2/cmd/[email protected]
go: github.com/telepresenceio/telepresence/v2/cmd/[email protected] (in github.com/telepresenceio/telepresence/[email protected]):
	The go.mod file for the module providing named packages contains one or
	more replace directives. It must not contain directives that would cause
	it to be interpreted differently than if it were the main module.

and I can't see what else I should install instead, if anything is available. I suppose we could detect OS and pull binaries from the release page somehow via the Makefile.

KTF chicken egg problems

Telepresence cannot install its egress point without a running cluster. The integration test harness generally wants to create its own cluster from scratch and start running tests immediately. We can use kind-action or similar to set one up in advance (and subsequent addons, such as metallb, are generally fine with this), but it may miss some settings.

A somewhat comedic option is to integrate Telepresence into KTF. There is, AFAIK, no reason we can't make an addon wrapper to function as a pseudo-CLI replacement, which would allow it to run as part of a KTF environment create and solve the distribution problem (no Go install? no problem!) in one go.

Container magic

If we only care about KIND (ideally we don't rule out integration tests against other platforms), we don't strictly need tunnels, as Docker networks all exist on the same host and can be routed in kernel. This should be viable according to the KIND devs but only on Linux, since other platform Docker is not integrated into the host OS the same way:

bentheelder 16:55:32
If you're on linux you could do it manually by adding a route to the node for the pod range IIRC.
but we don't support host network, IMHO for pods in the actual host network the best answer is the whole cluster in the host, IE run kubeadm on the host directly

(On mac/windows, net=host wouldn't enable it anyhow nor would adding a route manually b/c the nodes / containers aren't reachable from the host, only packet forwarding to the containers for forwarded ports b/c how docker desktop does networking)

we're mostly focused on portable test workflows => port forwarding or other tunneling options, but if you're only targeting linux you have a lot of flexibility in getting traffic to the nodes (And therefore the pods), with standard docker networking the node IPs should be reachable from the linux host, and traffic hitting the node container interface should route to pods. (edited)

@rainest
Copy link
Contributor Author

rainest commented Dec 7, 2023

Another KIND user mentioned https://sshuttle.readthedocs.io/en/stable/usage.html#usage as an option. It looks like it satisfies the client-side criteria (just exposes IP ranges).

@pmalek
Copy link
Member

pmalek commented Dec 7, 2023

go install github.com/telepresenceio/telepresence/v2/cmd/[email protected]
go: github.com/telepresenceio/telepresence/v2/cmd/[email protected] (in github.com/telepresenceio/telepresence/[email protected]):
	The go.mod file for the module providing named packages contains one or
	more replace directives. It must not contain directives that would cause
	it to be interpreted differently than if it were the main module.

and I can't see what else I should install instead, if anything is available. I suppose we could detect OS and pull binaries from the release page somehow via the Makefile.

You won't be able to use go install with a module that uses a replace. That's basically what the error says.


As for the service matter: the origin of those addresses is from the linked article https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#a-aaaa-records-1

Any Pods exposed by a Service have the following DNS resolution available:
pod-ip-address.service-name.my-namespace.svc.cluster-domain.example

Which is known to not work on GKE's kube-dns. Some more findings on this one: #4065 (comment)

@programmer04
Copy link
Member

Have you considered mirrord? It's MIT license

@pmalek
Copy link
Member

pmalek commented Dec 18, 2023

Have you considered mirrord? It's MIT license

This tool as far as I remember when I used it last time is meant to wrap the running binary by starting mirrord with the wrapped binary as CLI param. When trying to run this on Mac I remember hitting some problems like seg faults but I don't have the logs from those tests unfortunately.

As a side effect this approach will also make debugging practically impossible ( at least I'm not aware of a method of making it work this way ).

BTW: I've just found an issue that I created a couple of months ago for integration tests using GD: #3631

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants