Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding update on logging to file and advice in the meantime #343

Open
cookiemonsteruk opened this issue May 31, 2023 · 4 comments
Open

Comments

@cookiemonsteruk
Copy link

Hello.
I have been using stubby for a few years. Thank you for a wonderful piece of free functionality.
I have had trouble in the last 48 hours and the lack of logging to file makes it incredibly difficult, in fact so far impossible to diagnose the problem.
I have a simple setup although some might think unnecessarily complicated. I run stubby on my opnsense firewall, stringing along AdGuardHome for ad-blocking, Unbound as recursive resolver and stubby as the upstream forwarder to my selected DoT resolvers.
Client (udp 53) ->AdGH (listens on udp/tcp 53) -> Unbound (listens udp/tcp 5353) -> stubby (listens on udp/tcp 8053) -> DoT (853)

All services start on boot from I've started getting intermmittent SERVFAIL , NXDOMAIN and sometimes REFUSED response codes logged in AdGH from Unbound for domains that should be resolvable and indeed resolve later. An example is client.dropbox.com.
Of course I have done diagnostics that take me as far as stubby but I can't tell what is the problem that stubby is trying to deal with.
As one diagnostic step I stopped stubby and started again without the -g option but I could not get anything being logged to stdout.

In light of this may I ask:

  1. what is the update regarding the development progress to be able to configure stubby to log to file.
  2. any suggestions on how can I diagnose this scenario in the meantime?
    Thank you.
@cookiemonsteruk
Copy link
Author

I think I have made a little progress. I have compiled from source on a freebsd 13.1 virtual machine with the verbose logging option in readiness but haven't replaced it yet. I'll use it later if necessary. In the meantime I have done some testing with the current binary (no verbose logging).
The testing method is to run it manually with $ sudo /usr/local/bin/stubby -l -C /usr/local/etc/stubby/stubby.yml, verify it is running with sockstat . My two enabled resolvers in the config are 9.9.10 and 1.1.1.1 both to connect to on port 853.
I see it listening on 127.0.0.1:8053 both tcp and udp. I then dig on that machine and from a client. These are the results:
From client:

$ dig @192.168.5.1 -p 8053 www.example.com
;; communications error to 192.168.5.1#8053: timed out
;; communications error to 192.168.5.1#8053: timed out
;; communications error to 192.168.5.1#8053: timed out

; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> @192.168.5.1 -p 8053 www.example.com
; (1 server found)
;; global options: +cmd
;; no servers could be reached

On the server the only traffic seen is on the firewall, accepting the incoming request. No messages on the running stubby process.

Then I move to test directly on the server on a separate connection (stubby is in the foreground, not demonized):

$ dig @127.0.0.1 -p 8053 www.example.com

; <<>> DiG 9.18.10 <<>> @127.0.0.1 -p 8053 www.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 39756
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.example.com.		IN	A

;; Query time: 2493 msec
;; SERVER: 127.0.0.1#8053(127.0.0.1) (UDP)
;; WHEN: Wed May 31 22:28:05 BST 2023
;; MSG SIZE  rcvd: 33

and I see traffic in the stubby process:

$ sudo /usr/local/bin/stubby -l -C /usr/local/etc/stubby/stubby.yml
[21:13:23.242189] STUBBY: Read config from file /usr/local/etc/stubby/stubby.yml
[21:13:23.242471] STUBBY: DNSSEC Validation is ON
[21:13:23.242478] STUBBY: Transport list is:
[21:13:23.242480] STUBBY:   - TLS
[21:13:23.242484] STUBBY: Privacy Usage Profile is Strict (Authentication required)
[21:13:23.242486] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
[21:13:23.242488] STUBBY: Starting DAEMON....
[21:28:02.707630] STUBBY: --- SETUP(TLS): : Verify locations loaded
[21:28:02.707769] STUBBY: 9.9.9.10                                 : Conn opened: TLS - Strict Profile
[21:28:02.769579] STUBBY: 9.9.9.10                                 : Verify passed : TLS
[21:28:02.828388] STUBBY: 1.1.1.1                                  : Conn opened: TLS - Strict Profile
[21:28:02.838138] STUBBY: 1.1.1.1                                  : Verify passed : TLS
[21:28:05.196369] STUBBY: 9.9.9.10                                 : Conn closed: TLS - Resps=     2, Timeouts  =     0, Curr_auth =Success, Keepalive(ms)= 10000
[21:28:05.196384] STUBBY: 9.9.9.10                                 : Upstream   : TLS - Resps=     2, Timeouts  =     0, Best_auth =Success
[21:28:05.196388] STUBBY: 9.9.9.10                                 : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      1, Backoffs     =     0
[21:28:12.888159] STUBBY: 1.1.1.1                                  : Conn closed: TLS - Resps=     3, Timeouts  =     0, Curr_auth =Success, Keepalive(ms)= 10000
[21:28:12.888188] STUBBY: 1.1.1.1                                  : Upstream   : TLS - Resps=     3, Timeouts  =     0, Best_auth =Success
[21:28:12.888192] STUBBY: 1.1.1.1                                  : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      1, Backoffs     =     0

Does that help to identify what might be happening?

@cookiemonsteruk
Copy link
Author

Ah. Test from the client is flawed. Stubby is only listening on the loopback so only the queries from the same machine locally are valid tests. I still get servfails with "recursion requested but not available". Any suggestions of how to investigate this?
Also I have now replaced the binary with the newly compiled one with the verbose option enabled but the logging is the same when using -l flag upon launch. Do I need a different flag or option in config?

@saradickinson
Copy link
Contributor

Hi @cookiemonsteruk - a few initial comments/suggestions

  • You are on the right track with general logging for Stubby. Running it using
    sudo /usr/local/bin/stubby -l -C /usr/local/etc/stubby/stubby.yml
    should give you logging at the connection level. In which case you should be able to modify the stubby service use this logging?
  • we still don't have an 'official' a way to do per query logging in stubby even though it has been on the list a long time - sorry....
  • if you are feeling adventurous, you can use this PR (Add session key logging with OpenSSL. Just requires OpenSSL 1.1.1 and… getdns#435) as reference to add the ability to export session keys which can be used to debug TLS traffic in Wireshark
  • otherwise my immediate thought is to try turning off DNSSEC validation in Stubby. Since you are using Unbound in your resolution chain I'm assuming that you have DNSSEC validation turned on there and so don't also need it in Stubby. Your resolution will also be more efficient only doing DNSSEC in Unbound.
  • You could try removing Stubby from the resolution chain and using the Unbound capability to forward your traffic over TLS. Unbound doesn't have as many options as Stubby but it may help you narrow down the problem: https://dnsprivacy.org/dns_privacy_clients/#unbound
  • depending on how you installed stubby you may have access to the command line tool getdns_query which is similar to dig in that it can be used to issue single DNS queries, and it uses the same library as stubby (getdns).

Hope this helps to get you started.

@cookiemonsteruk
Copy link
Author

Thank you for the suggestions @saradickinson . Indeed the current logging is not giving enough information to pinpoint the problem stubby is having.
I fail to see what can I do with the suggested PR. I think Ii read it as being merged into stubby version 1.6 . The version available in freebsd ports only goes to 1.5.2_4 so that'w what I used to build. Reading your message however that there is no per-connection logging anyway, it seems undergoing the pain of trying to build from git sources instead in a freebsd VM is of little to no use. So I'll have to put that as something to try if all else fails.

I will try your suggestion to disable DNSSEC in Stubby or Unbound. One place only.
I have regrettably removed Stubby from the chain and so far no problems but I enabled only one DoT server for purpose of diagnostic. First I used the same one Stubby had and I had similar problems.

By the way I am pretty convinced the problem is not in Stubby. I've had this setup for over a year without problems.
I suspect it is one of the upstreams that is my problem, but only suddenly. They are the ones being used in the aforementioned last year.
Having Stubby as a black box withouth adequate loggin I know we all agree is still a problem. If it is still being worked on, it should be towards the top of the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants