Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DNS Retry period, which is configured wrongly #1979

Closed
lucperneel opened this issue May 16, 2021 · 5 comments
Closed

Fix DNS Retry period, which is configured wrongly #1979

lucperneel opened this issue May 16, 2021 · 5 comments

Comments

@lucperneel
Copy link

lucperneel commented May 16, 2021

As I have no access (yet) to repo to add a pull request, just post a proposed change here.
It is concerning the retry time in the DNS SOA header.
As a happy user, and a software developer myself, I rather try to propose fixes to improve the already nicely working mailinabox ;-)

Subject: [PATCH] Fix DNS Retry period, which is configured wrongly

Following RFC 1912:

      Retry: If a secondary was unable to contact the primary at the
          last refresh, wait the retry value before trying again.  This
          value isn't as important as others, unless the secondary is on
          a distant network from the primary or the primary is more
          prone to outages.  It's typically some fraction of the refresh
          interval.

This was wrongly set too high in miab.
Typically this should be somewhere between 120 and 7200.

With the old settings, if a secondary fails to reach you (example, when rebooting your miab instance),
then it will take a long time before the secondary retries and gets updated.
As a result the update time on the secondary will be way longer than the configured
refresh period, which is not correct.
---
 management/dns_update.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/management/dns_update.py b/management/dns_update.py
index c000f34..40ff82d 100755
--- a/management/dns_update.py
+++ b/management/dns_update.py
@@ -497,7 +497,7 @@ $TTL 86400          ; default time to live
 @ IN SOA ns1.{primary_domain}. hostmaster.{primary_domain}. (
            __SERIAL__     ; serial number
            7200     ; Refresh (secondary nameserver update interval)
-           86400    ; Retry (when refresh fails, how often to try again)
+           3600     ; Retry (when refresh fails, number of seconds to wait for next retry, should be less or equal Refresh period)
            1209600  ; Expire (when refresh fails, how long secondary nameserver will keep records around anyway)
            86400    ; Negative TTL (how long negative responses are cached)
            )
--
@JoshData
Copy link
Member

#1892 was probably a little too ambitious. I agree with this change.

@nztvar
Copy link

nztvar commented Dec 29, 2021

MXToolbox also flags this.

SOA Retry Value is outside of the recommended range
ns1.box.example.email reported Retry 86400 : Retry is recommended to be between 120 and 7200.

@lucperneel
Copy link
Author

Thanks, so maybe some further explanation why the retry should be smaller than the refresh: the refresh period is the time between refreshes between a primary and a secondary DNS server. So this is the "Normal" hapyy case.
However, if for some reason the request fails, it will swap the Refresh with the Retry value (instead to wait for a Refresh period, it will wait for the Retry period).
Therefore setting the Retry period larger than the Refresh period is very strange and unlogical.

As an example: say you have nsp (primary name server) and nss (secondary name server).
In the happy case scenario, in the current config the nss will each 7200 sec fetch the data from the nsp.
(meaning that within two hours the changes should ripple between the nsp and nss).
Now if at the moment of the refresh, the nsp is down (being rebooted for instance), then the nss will retry after the Retry period, in the original configuration case after 86400 sec (one day). So if something goes wrong, your refresh goes suddenly from 2 hours to 1 day....

Therefore the retry should logically be set (much) smaller than the Refresh (as we want to recover soon enough, not to impact the ripple time too much in case something goes wrong).

@myfirstnameispaul
Copy link
Contributor

@lucperneel If you take a look at the above linked #1965, you can see that a proposed change will fix this problem, but it seems nobody is responding to the changes requested by the maintainer for the PR to be accepted. So it is just sitting there.

@JoshData
Copy link
Member

JoshData commented Jan 8, 2022

Fixed, thanks for the clear discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants