Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stubby randomly(?) refuses to connect to DoT servers #297

Open
owah opened this issue Sep 9, 2021 · 4 comments
Open

Stubby randomly(?) refuses to connect to DoT servers #297

owah opened this issue Sep 9, 2021 · 4 comments

Comments

@owah
Copy link

owah commented Sep 9, 2021

Hello,

I still have the issue described here: #277, but I noticed that even on a fresh reboot, stubby sometimes won't even connect to the DNS servers in my list. I'm using four servers supplied by NextDNS:

DNS=45.90.28.0
DNS=2a07:a8c0::
DNS=45.90.30.0
DNS=2a07:a8c1::

The servers are online and respond to manual queries fine, e.g.:
kdig -d +tls-ca +tls-host=dns.nextdns.io example.com @45.90.30.0
stubby.log

Any ideas?

@saradickinson
Copy link
Contributor

Looking at this log specifically is seems stubby cannot connect to IPv6 at all (is it available?) but can intermittently connect to the IPv4 addresses - which is the annoying behaviour. There certainly seem to be events where the servers is shutting the connection, but I don't think that is the root of the problem. I think this a transport problem, so if full logging isn't showing any more detail the only way to debug is probably to grab some PACPS...

@owah
Copy link
Author

owah commented Sep 14, 2021

For some reason IPv6 was disabled while capturing that log, my bad! But even with IPv6 enabled the behaviour is the same.
I have started four Wireshark sessions, each filtering on a different NextDNS IP. Then I used dig @localhost to cycle through the round-robbing servers. Three of them returned proper A records, but one of them returned SERVFAIL.

In this case it was this server "45.90.28.0". When I compare the logs, they look almost identical. First the usual TLS handshake, then some encrypted data and at the end a FIN packet.

I've then restarted all four wireshark sessions and repeated the resolving experiment. 3 times the DNS return is NOERROR and proper A records and one time SERVFAIL on the same server. Again the log looks proper, with TLS handshake and all.

I'd rather not share the pcap, because the SNI contains my unique NextDNS domain and the rDNS of my IP is also very telling :)

@owah
Copy link
Author

owah commented Sep 29, 2021

There seems to be a thread on the NextDNS forum, that talks about this bug. They seem to confirm that it is not related to server availability or the TLS connection, but something else implementation wise. https://help.nextdns.io/t/q6h1ccj/tls-connection-failures-stubby

@saradickinson
Copy link
Contributor

@owah thanks to debug and the link to the NextDNS issue. I think it is worth looking to see if we could improve the getdns code that stubby uses to also backoff when we get a stream of SERVFAILS from an upstream to make resolution more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants