Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to log individual upstream DNS failures #363

Open
supersebbo opened this issue Dec 10, 2024 · 3 comments
Open

How to log individual upstream DNS failures #363

supersebbo opened this issue Dec 10, 2024 · 3 comments

Comments

@supersebbo
Copy link

I am trying (with little success) to debug an issue with intermittent DNS resolution failures on my network. Symptom is that clients report "Temporary failure in name resolution" for periods of 30-60 seconds, then start working again.

To me this smells like there is a single resolver in my pool of 20 that is failing to respond, and this heals itself when stubby moves onto the next resolver in the round-robin pool.

However, when I have tested all the upstream resolvers in the pool, they are all working, so it would seem the issue with the upstream resolver is intermittent.

Is there a level of logging that Stubby can output that will help me track down this unreliable DNS resolver?

The only logging I get from Stubby today is when ALL upstream resolvers have failed (which it logs mercilessly).

@cookiemonsteruk
Copy link

cookiemonsteruk commented Dec 16, 2024

It does make you wonder if getdns/stubby is abandoned now, doesn't it? No answers to queries here and logging enhancements that were supposed to be delivered a long time ago (see product page for the statement) haven't materialised.
@saradickinson could you comment on it?
Edit: I am asking not about this particular logging of particular upstreams (sorry @supersebbo ) but logging as in the ability for stubby to a) log to file ; b) configurable verbose levels ; c) use a logging library that allows to define retention and rotation parameters i.e. log files up to XX MB in size, create new ones on reaching the size, keep maximum of YY files.

@saradickinson
Copy link
Contributor

@supersebbo What platform are you running Stubby on and I presume it is running as a service? What level of logging does it currently have? There are 2 ways to configure logging, but unfortunately both require a restart to stubby to change (and note the command line setting overrules the config file setting)

  1. Via the config file e.g.
    log_level: GETDNS_LOG_NOTICE
  2. Via a command line argument (check the service file):
    stubby -v 7 where the logging is:
	-l	Enable logging of all logs (same as -v 7)
	-v	Specify logging level (overrides -l option). Values are
			0: EMERG  - System is unusable
			1: ALERT  - Action must be taken immediately
			2: CRIT   - Critical conditions
			3: ERROR  - Error conditions
			4: WARN   - Warning conditions
			5: NOTICE - Normal, but significant, condition
			6: INFO   - Informational message
			7: DEBUG  - Debug-level message

If you have INFO level logs you should see failures of individual upstream connections there e.g. if I have 2 upstreams configured (a working one 185.49.141.37 and a non-existent one 185.49.141.3) then with INFO logs I see the following:

>sudo ./stubby -C stubby.yml -v 6
[13:48:24.561025] STUBBY: Stubby version: Stubby 0.4.3
[13:48:24.561513] STUBBY: DNSSEC Validation is OFF
[13:48:24.561519] STUBBY: Transport list is:
[13:48:24.561522] STUBBY:   - TLS
[13:48:24.561524] STUBBY: Privacy Usage Profile is Strict (Authentication required)
[13:48:24.561528] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
[13:48:28.954205] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:48:28.954223] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     1, Conn_shuts=      0, Backoffs     =     0
[13:48:33.318274] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:48:33.318293] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     2, Conn_shuts=      0, Backoffs     =     0
[13:48:33.318301] STUBBY: 185.49.141.3                             : Upstream   : !Backing off TLS on this upstream    - Will retry again in 2s at Wed Dec 18 13:48:35 2024
[13:48:36.151860] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:48:36.151879] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     1, Conn_shuts=      0, Backoffs     =     1
[13:48:37.381282] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:48:37.381299] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     2, Conn_shuts=      0, Backoffs     =     1
[13:48:37.381305] STUBBY: 185.49.141.3                             : Upstream   : !Backing off TLS on this upstream    - Will retry again in 4s at Wed Dec 18 13:48:41 2024
[13:48:45.794320] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:48:45.794338] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     1, Conn_shuts=      0, Backoffs     =     2
[13:48:47.279146] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:48:47.279165] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     2, Conn_shuts=      0, Backoffs     =     2
[13:48:47.279171] STUBBY: 185.49.141.3                             : Upstream   : !Backing off TLS on this upstream    - Will retry again in 8s at Wed Dec 18 13:48:55 2024
[13:48:56.133026] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:48:56.133042] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     1, Conn_shuts=      0, Backoffs     =     3

with DEBUG logs you see all successful connections too. This can be useful to show the number of timeouts or cases where the server shuts the connection (rather than Stubby) which can indicate issues:

>sudo ./stubby -C stubby.yml -v 7
[13:49:07.137106] STUBBY: Read config from file stubby.yml
[13:49:07.137567] STUBBY: Stubby version: Stubby 0.4.3
[13:49:07.137679] STUBBY: DNSSEC Validation is OFF
[13:49:07.137685] STUBBY: Transport list is:
[13:49:07.137688] STUBBY:   - TLS
[13:49:07.137691] STUBBY: Privacy Usage Profile is Strict (Authentication required)
[13:49:07.137693] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
[13:49:07.137696] STUBBY: Starting DAEMON....
[13:49:09.491970] STUBBY: 185.49.141.37                            : Conn opened: TLS - Strict Profile
[13:49:09.545217] STUBBY: 185.49.141.37                            : Verify passed : TLS
[13:49:11.463900] STUBBY: 185.49.141.3                             : Conn opened: TLS - Strict Profile
[13:49:11.489925] STUBBY: 185.49.141.3                             : Conn closed: TLS - *Failure*
[13:49:11.489946] STUBBY: 185.49.141.3                             : Conn closed: TLS - Resps=     0, Timeouts  =     0, Curr_auth =   None, Keepalive(ms)=     0
[13:49:11.489950] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:49:11.489953] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     1, Conn_shuts=      0, Backoffs     =     0
[13:49:13.087277] STUBBY: 185.49.141.3                             : Conn opened: TLS - Strict Profile
[13:49:13.113342] STUBBY: 185.49.141.3                             : Conn closed: TLS - *Failure*
[13:49:13.113373] STUBBY: 185.49.141.3                             : Conn closed: TLS - Resps=     0, Timeouts  =     0, Curr_auth =   None, Keepalive(ms)=     0
[13:49:13.113377] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:49:13.113380] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     2, Conn_shuts=      0, Backoffs     =     0
[13:49:13.113387] STUBBY: 185.49.141.3                             : Upstream   : !Backing off TLS on this upstream    - Will retry again in 2s at Wed Dec 18 13:49:15 2024
[13:49:17.212832] STUBBY: 185.49.141.3                             : Conn opened: TLS - Strict Profile
[13:49:17.237645] STUBBY: 185.49.141.3                             : Conn closed: TLS - *Failure*
[13:49:17.237667] STUBBY: 185.49.141.3                             : Conn closed: TLS - Resps=     0, Timeouts  =     0, Curr_auth =   None, Keepalive(ms)=     0
[13:49:17.237670] STUBBY: 185.49.141.3                             : Upstream   : TLS - Resps=     0, Timeouts  =     0, Best_auth =   None
[13:49:17.237673] STUBBY: 185.49.141.3                             : Upstream   : TLS - Conns=     0, Conn_fails=     1, Conn_shuts=      0, Backoffs     =     1
[13:49:27.262820] STUBBY: 185.49.141.37                            : Conn closed: TLS - Resps=     6, Timeouts  =     0, Curr_auth =Success, Keepalive(ms)= 10000
[13:49:27.262836] STUBBY: 185.49.141.37                            : Upstream   : TLS - Resps=     6, Timeouts  =     0, Best_auth =Success
[13:49:27.262839] STUBBY: 185.49.141.37                            : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      0, Backoffs     =     0

Having said all that even with round_robin_upstreams: 0 I would expect Stubby to move on to the next resolver much quicker than 30-60s if just one if failing. Would be interesting to know how/if the behaviour is different with round_robin_upstreams: 1

@saradickinson
Copy link
Contributor

@cookiemonsteruk
Sadly the direct funding for new feature development in Stubby/getdns ceased several years ago now and we haven’t been able to find any new funding. And there is also no longer any low level funding for maintenance either for the last couple of years…

I personally do what I can in my spare time to respond to minor fixes and issues posted here (apologies that is often slow).

NLnet Labs is the official home to the (underlying) getdns project and they manage the releases, etc. so perhaps @wtoorop can comment on what resources they have available and overall project status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants