Skip to content

Latest commit

 

History

History
171 lines (115 loc) · 7.93 KB

README-DEV.md

File metadata and controls

171 lines (115 loc) · 7.93 KB

CoreDNS RRL Plugin Design Spec

This spec defines a CoreDNS plugin intended to replicate the behavior of the rate-limit feature in BIND.

In the interest of keeping PRs as small as possible, RRL will first implement the following minimal set of sub-functions (aka minimal viable product).

  • Parsing of Corefile
  • Categorization of responses, and accounts debits/credit
  • Always block response when account is negative (no slip, i.e. slip = 0)

The following functions, if added, would bring RRL into feature parity with BIND’s implementation of rate-limit. These can be added in separate PRs, to keep PRs small and easily digestible.

  • configurable slip ratio (slipping = send truncated response instead of dropping)
  • all-per-second (upper limit RRL at which we stop slipping)
  • expose metrics
  • exempt-clients (client list / cidrs to exempt from RRL)
  • qps-scale (scale down allowances proportionally to current qps load)

Initial Features Spec

RRL will be delivered as a plugin that accepts a list of zones for which it will track and enforce rate limiting for UDP responses. e.g.

rrl example.org {
    responses-per-second 10
}

As a plugin, RRL should take an incoming response, pass it through the remaining plugins in the plugin chain, and then track/process the result before responding to the client. For this reason, it needs to be near the top of the plugin list.

Plugin Directives

Available configuration options (following naming convention of BIND)…

rrl [ZONES...] {
    window SECONDS
    ipv4-prefix-length LENGTH
    ipv6-prefix-length LENGTH
    responses-per-second ALLOWANCE
    nodata-per-second ALLOWANCE
    nxdomains-per-second ALLOWANCE
    referrals-per-second ALLOWANCE
    errors-per-second ALLOWANCE
    max-table-size SIZE
}
  • window SECONDS - defines a rolling window in SECONDS during which response rates are tracked. Default 15

  • ipv4-prefix-length LENGTH - the prefix LENGTH in bits to use for identifying a ipv4 client. Default 24

  • ipv6-prefix-length LENGTH - the prefix LENGTH in bits to use for identifying a ipv6 client. Default 56

  • responses-per-second ALLOWANCE - the number of positive responses allowed per second. Default 0

  • nodata-per-second ALLOWANCE - the number of empty (NODATA) responses allowed per second. Defaults to responses-per-second.

  • nxdomains-per-second ALLOWANCE - the number of negative (NXDOMAIN) responses allowed per second. Defaults to responses-per-second.

  • referrals-per-second ALLOWANCE - the number of negative (NXDOMAIN) responses allowed per second. Defaults to responses-per-second.

  • errors-per-second ALLOWANCE - the number of error responses allowed per second (excluding NXDOMAIN). Defaults to responses-per-second.

  • max-table-size SIZE - the maximum number of responses to be tracked at one time. When exceeded, rrl stops rate limiting new responses.

Record Keeping: ResponseAccounts

RRL tracks responses rates using a table of ResponseAccounts. A ResponseAccount consists of a token, and a balance.

The ResponseAccount token uniquely identifies a category of responses and is comprised the following data extracted from a response:

  • Prefix of the client IP (per the ipv4/6-prefix-length)
  • Requested name (qname) see exceptions below
  • Requested type (qtype) excluding response type of error (see response type below)
  • Response type (each corresponding to the configurable per-second allowances)
    • response - for positive responses that contain answers
    • nodata - for NODATA responses
    • nxdomain - for NXDOMAIN responses
    • referrals - for referrals or delegations
    • error - for all DNS errors (except NXDOMAIN)

To better protect against attacks using invalid requests, requested name and type are not used in the token for error type requests. In other words, all error responses are limited collectively per client, regardless of qname or qtype.

For nxdomain and referrals, the authoritative domain is used instead of the full qname.

The ResponseAccount balance is an integer. When the balance becomes negative for a ResponseAccount, any responses that match its token are dropped until the balance becomes positive again. The ResponseAccount balance cannot become more positive than the per-second allowance and cannot become more negative than window * the per-second allowance of the response type.

ResponseAccount balances are credited and debited as outlined below.

ResponseAccount Credits

Conceptually, RRL will credit once per second each existing ResponseAccount balance by an amount equal to per-second allowance of the the corresponding response type. If a ResponseAccount balance exceeds window, then the ResponseAccount can be evicted to keep the ResponseAccount table from running out of space.

As implemented, it's probably more performant to calculate credits on demand (at debit time) instead of in a separate asynchronous thread. In the same vein, it's probably more performant to defer evictions until space is needed (at insert time, when space runs out).

ResponseAccount Debits

ResponseAccount balances are debited at the time of sending a UDP response to a client, using the following logic ...

  1. Calculate the ResponseAccount token for the response
  2. If the token doesn’t exist in the to the ResponseAccount table, add the token as follows…
    1. If the max-table-size is reached/exceeded, log an error/warning, and send response to client (done)
    2. Add the token to the ResponseAccount table
    3. Credit the balance to maximum - 1. I.e. window - 1
  3. If the token does exist, debit the balance by 1
  4. If balance is >= 0, send response to client. (done)
  5. If balance is < 0, then drop the response, sending nothing to the client (done)

ResponseAccount Concurrency

Since the ResponseAccount table will be read and written to from parallel threads, locking should be used to ensure data integrity is maintained for all reads/writes.

Follow up features

Wildcard Flooding Mitigation

Implement better protect against wildcard flooding. For domains CoreDNS is authoritative for, we can introduce an interface Authoritative to be implemented by any plugins that are authoritative. The interface would include functions listing the zones the pliugin is authortiative for, and all wildcard base domains produced by the plugin. RRL could check if plugins implememt Authoritative, and compare responses to the lists of authoritative zones and wildcard base domains to account for responses in the same way BIND does. We would also use this to more correctly check for authoritative domains for negative responses and referrals, for which we currently rely on neg cache SOAs.

Slip ratio

With a slip ratio defined, responses will not always be dropped when a balance goes negative - we let some slip through. However, instead of sending the whole response to the client, we send a small truncated response TC=1. This truncated response is intended to prompt the client to retry on TCP. Slipping in this way lets legitimate clients know to use TCP instead without flooding.

Implementing slip ratios will introduce a new plugin directive: slip

rrl [ZONES...] {
    slip RATE
}

Slip RATE is an integer between 0 and 10. 0 means never slip. 1 means always slip. Other numbers mean slip every nth time. Furthermore, a new property should be added to the ResponseAccount, slipCount, which defaults to slip. This value should be decremented every time we block a response matching the ResponseAccount token. When the value hits zero, we should slip a truncated response to the client, and reset the slipCount to slip. Response types that cannot be truncated such as REFUSED and SERVFAIL should be leaked in their entirety at the slip rate.

All-per-second

TBD (upper limit rates for all response types)

Exposing metrics

TBD

Exempt-clients

TBD (client list / cidrs to exempt from RRL)

QPS-scale

TBD (scale down allowances proportionally to current qps load)