-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: add dynamic DNS resolution for nodes #30822
base: master
Are you sure you want to change the base?
Conversation
return !n.ip.IsValid() || time.Since(n.dnsResolved) > n.dnsTTL | ||
} | ||
|
||
func (n *Node) GetTTL() time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is really needed
} | ||
|
||
// DNSName returns the stored DNS name | ||
func (n *Node) DNSName() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This getter is also not really needed imo
hostname := u.Hostname() | ||
ip := net.ParseIP(hostname) | ||
if ip == nil { | ||
ips, err := lookupIPFunc(hostname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong here, but if I understand correctly this will not create a correct V4WithDNS node if the initial dial is working. Shouldn't we in this case always create NewV4WithDNS even if the DNS lookup is working (so we can get the DNS refresh ability). Otherwise we will kinda have three classes of nodes: raw-ip, DNS resolved on startup, DNS not resolved on startup all with slightly different behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of this, but the PR will need some updates.
One key thing that's violated here, is that enode.Node
is supposed to be an immutable type, i.e. the fields of it can't be changed.
Storing the DNS name into the Node object is the correct approach, but we can't use Node to also track the updates. The TTL logic should be moved into p2p.dialTask
. When dialing the node fails, we can resolve it, and then create another Node instance with the resolved address.
Note also that we have existing 'resolve' logic that checks the DHT for updates to the node endpoint. If the node is found in the DHT, we don't need to use the DNS at all.
Closes #23210
Context
When deploying Geth in Kubernetes with ReplicaSets, we encountered two DNS-related issues affecting node connectivity. First, during startup, Geth tries to resolve DNS names for static nodes too early in the config unmarshaling phase. If peer nodes aren't ready yet (which is common in Kubernetes rolling deployments), this causes an immediate failure:
The second issue comes up when pods get rescheduled to different nodes - their IPs change but peers keep using the initially resolved IP, never updating the DNS mapping.
This PR adds proper DNS support to enodes by deferring resolution to connection time and adding TTL-based updates. It also handles DNS failures gracefully instead of failing fatally during startup, making it work better in container environments where IPs are dynamic and peers come and go during rollouts.