Network topology awareness #669

phillebaba · 2024-12-17T23:39:35Z

Describe the problem to be solved

When doing lookups in the DHT the first peer found will be returned. The query does not take into consideration the network topology of the Kuberentes cluster. Most cloud providers have multiple zones per region. Network traffic within the same zone is usually faster and cheaper compared to cross zone traffic. Spegel should prioritize peers in the same zone over other peers when possible.

Proposed solution to the problem

There has been some interesting research already done on the topic about partitioning.

https://research.protocol.ai/publications/enriching-kademlia-by-partitioning/monteiro2022.pdf

We need to decide on the best solution to partition the DHT to avoid added latency for lookups.

Relates to #551 and #530

craig-seeman · 2025-01-13T15:40:17Z

I'm curious what your thoughts are on an (admittedly rather naive) but simple approach to topology. I could be misinterpreting the paper linked, but it almost sounds like this sort of feature currently does not exist in the libraries and would have to be implemented? I have not looked too deeply into the code that translate sha key on the p2p side to how we interface with containerd, so this may not even work though.

If this is the case however, the approach I was wondering about would be to essentially just leverage a majority of the existing code and simply (if some sort of spegel topology setting is enabled) advertise an additional image with a single kubernetes label added in the p2p key. This would obviously double the keyspace; but given the kubernetes best practices etc - this count shouldn't get too out of hand.

For example:
Image: 983487d9c4b7451b0e7d282114470d3a0ad50dc5e554971a4d1cda04acde670b

What could happen is spegel advertises out [983487d9c4b7451b0e7d282114470d3a0ad50dc5e554971a4d1cda04acde670b] as it currently does, but also advertise out a key of
sha256("983487d9c4b7451b0e7d282114470d3a0ad50dc5e554971a4d1cda04acde670b" + "topology.kubernetes.io/zone: "us-east-1c"") [589aa1323f0a4834bcbfe3a50a157c4fdded821c79873e2a21e08d3e36654f42]

During an image pull - if the topology setting is enabled, the first lookup would be to the topology (in this case looking for hash 589aa1323f0a4834bcbfe3a50a157c4fdded821c79873e2a21e08d3e36654f42) if none is found (none on the local zone) a possible additional setting could be in place that would determine what to do next (try a cluster-wide lookup or pull from the repository directly instead) and would do a query for 983487d9c4b7451b0e7d282114470d3a0ad50dc5e554971a4d1cda04acde670b.

phillebaba · 2025-01-14T12:04:35Z

I do not think you are wrong with your analysis. The paper describes a situation that is a lot more complicated than anything that we are really facing. Spegel is aimed to run in private clusters so it does not have the same internet scale challenges. We only have to deal with a fixed set of regions, for most this would be 3 availability zones. For others it may be more if they are running clusters cross regions, but I find that this is a lot more uncommon.

While it would increase the advertised keys I do think the benefits from prioritizing peers in the same zone outweighs any cost. Either way we can easily run the benchmarks and verify that this is the case.

I have been looking around and have not found a good way to pass the topology key into the running pods. I have found a couple of issues in the Kubernetes repository but most people suggest using external projects to solve this. Are you aware of an easy way to do this?

I am open to making this an opt in feature initially, to start evaluating the cost of the increased advertised keys. Down the road if we come up with a smarter solution, we can always make changes.

phillebaba · 2025-01-14T13:17:53Z

Stumbled upon some more good research on this topic which seems to be linked to the original paper I linked.

https://asc.di.fct.unl.pt/~jleitao/thesis/MonteiroMSc.pdf

craig-seeman · 2025-01-14T15:19:14Z

I have been looking around and have not found a good way to pass the topology key into the running pods. I have found a couple of issues in the Kubernetes repository but most people suggest using external projects to solve this. Are you aware of an easy way to do this?

One of the primary ways in my experience in how we deal with this is using the downward api directly in the manifest and just throwing it at the pod as an env variable in the manifest definition of the container's env section -

        - name: NODE_ZONE
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['topology.kubernetes.io/zone']

The other method using the downwards api would be to mount the api as a volume that the container can read and all of the various options will be exposed as a directory within the container itself.

One thing to note though, this approach of using the downwards api can SOMETIMES get tricky if they're reading custom labels tagged by another daemonset and that daemon starts after your pod.

https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/

phillebaba added the enhancement New feature or request label Dec 17, 2024

phillebaba moved this to Todo in Roadmap Dec 17, 2024

phillebaba added this to Roadmap Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network topology awareness #669

Network topology awareness #669

phillebaba commented Dec 17, 2024 •

edited

Loading

craig-seeman commented Jan 13, 2025

phillebaba commented Jan 14, 2025

phillebaba commented Jan 14, 2025

craig-seeman commented Jan 14, 2025 •

edited

Loading

Network topology awareness #669

Network topology awareness #669

Comments

phillebaba commented Dec 17, 2024 • edited Loading

Describe the problem to be solved

Proposed solution to the problem

craig-seeman commented Jan 13, 2025

phillebaba commented Jan 14, 2025

phillebaba commented Jan 14, 2025

craig-seeman commented Jan 14, 2025 • edited Loading

phillebaba commented Dec 17, 2024 •

edited

Loading

craig-seeman commented Jan 14, 2025 •

edited

Loading