Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: add dynamic DNS resolution for nodes #30822

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

0xVasconcelos
Copy link

Closes #23210

Context

When deploying Geth in Kubernetes with ReplicaSets, we encountered two DNS-related issues affecting node connectivity. First, during startup, Geth tries to resolve DNS names for static nodes too early in the config unmarshaling phase. If peer nodes aren't ready yet (which is common in Kubernetes rolling deployments), this causes an immediate failure:

INFO [11-26|10:03:42.816] Starting Geth on Ethereum mainnet...
INFO [11-26|10:03:42.817] Bumping default cache on mainnet         provided=1024 updated=4096
Fatal: config.toml, line 81: (p2p.Config.StaticNodes) lookup idontexist.geth.node: no such host

The second issue comes up when pods get rescheduled to different nodes - their IPs change but peers keep using the initially resolved IP, never updating the DNS mapping.
This PR adds proper DNS support to enodes by deferring resolution to connection time and adding TTL-based updates. It also handles DNS failures gracefully instead of failing fatally during startup, making it work better in container environments where IPs are dynamic and peers come and go during rollouts.

return !n.ip.IsValid() || time.Since(n.dnsResolved) > n.dnsTTL
}

func (n *Node) GetTTL() time.Duration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is really needed

}

// DNSName returns the stored DNS name
func (n *Node) DNSName() string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This getter is also not really needed imo

hostname := u.Hostname()
ip := net.ParseIP(hostname)
if ip == nil {
ips, err := lookupIPFunc(hostname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong here, but if I understand correctly this will not create a correct V4WithDNS node if the initial dial is working. Shouldn't we in this case always create NewV4WithDNS even if the DNS lookup is working (so we can get the DNS refresh ability). Otherwise we will kinda have three classes of nodes: raw-ip, DNS resolved on startup, DNS not resolved on startup all with slightly different behavior

@fjl fjl changed the title p2p/enode: add dynamic DNS resolution for nodes p2p: add dynamic DNS resolution for nodes Nov 28, 2024
Copy link
Contributor

@fjl fjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of this, but the PR will need some updates.

One key thing that's violated here, is that enode.Node is supposed to be an immutable type, i.e. the fields of it can't be changed.

Storing the DNS name into the Node object is the correct approach, but we can't use Node to also track the updates. The TTL logic should be moved into p2p.dialTask. When dialing the node fails, we can resolve it, and then create another Node instance with the resolved address.

Note also that we have existing 'resolve' logic that checks the DHT for updates to the node endpoint. If the node is found in the DHT, we don't need to use the DNS at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StaticNodes / TrustedNodes useless in PoA Setup
3 participants