Unexpected rejections with rate limiters that allow more than a billion requests/second and are accessed from more than one thread #203

rkd-msw · 2023-09-06T21:06:18Z

Here is a minimal repro:

use governor::{
    clock::QuantaClock,
    middleware::StateInformationMiddleware,
    state::{InMemoryState, NotKeyed},
    Quota, RateLimiter,
};
use std::sync::Arc;
use std::thread;

fn rlspin(rl: Arc<RateLimiter<NotKeyed, InMemoryState, QuantaClock, StateInformationMiddleware>>) {
    for _ in 0..1_000_000 {
        rl.check().map_err(|e| dbg!(e)).unwrap();
    }
}

fn main() {
    let clock = QuantaClock::default();
    let quota = Quota::per_second(1_000_000_001.try_into().unwrap());
    dbg!(quota);

    let rate_limiter: Arc<
        RateLimiter<NotKeyed, InMemoryState, QuantaClock, StateInformationMiddleware>,
    > = Arc::new(
        RateLimiter::direct_with_clock(quota, &clock)
            .with_middleware::<StateInformationMiddleware>(),
    );

    let rate_limiter2 = rate_limiter.clone();

    thread::spawn(move || {
        rlspin(rate_limiter2);
    });
    rlspin(rate_limiter);
}

(with a Cargo.toml that just depends on governor 0.6.0)

This reliably fails for me:

$ cargo run --release
   Compiling govtest v0.1.0 (/home/robday/src/mntnlake/govtest)
    Finished release [optimized] target(s) in 0.98s
     Running `target/release/govtest`
[src/main.rs:19] quota = Quota {
    max_burst: 1000000001,
    replenish_1_per: 0ns,
}
[src/main.rs:12] e = NotUntil {
    state: StateSnapshot {
        t: Nanos(0ns),
        tau: Nanos(0ns),
        time_of_measurement: Nanos(224.941µs),
        tat: Nanos(224.941µs),
    },
    start: QuantaInstant(
        Nanos(170098.543484062s),
    ),
}
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: NotUntil { state: StateSnapshot { t: Nanos(0ns), tau: Nanos(0ns), time_of_measurement: Nanos(224.941µs), tat: Nanos(224.941µs) }, start: QuantaInstant(Nanos(170098.543484062s)) }', src/main.rs:12:41
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

However, if I change line 18 to let quota = Quota::per_second(1_000_000_000.try_into().unwrap()); it reliably passes:

$ cargo run --release
   Compiling govtest v0.1.0 (/home/robday/src/mntnlake/govtest)
    Finished release [optimized] target(s) in 0.88s
     Running `target/release/govtest`
[src/main.rs:19] quota = Quota {
    max_burst: 1000000000,
    replenish_1_per: 1ns,
}
$

and if I comment out the thread::spawn it also passes:

$ cargo run --release
   Compiling govtest v0.1.0 (/home/robday/src/mntnlake/govtest)
warning: unused import: `std::thread`
 --> src/main.rs:8:5
  |
8 | use std::thread;
  |     ^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused variable: `rate_limiter2`
  --> src/main.rs:28:9
   |
28 |     let rate_limiter2 = rate_limiter.clone();
   |         ^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_rate_limiter2`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: `govtest` (bin "govtest") generated 2 warnings (run `cargo fix --bin "govtest"` to apply 2 suggestions)
    Finished release [optimized] target(s) in 0.73s
     Running `target/release/govtest`
[src/main.rs:19] quota = Quota {
    max_burst: 1000000001,
    replenish_1_per: 0ns,
}

I hit this when attempting to disable rate-limiting by configuring a rate-limit of u32::MAX, so I have an easy workaround of configuring a rate-limit of one billion, which is much higher than I need - but I thought you might appreciate the bug report anyway!

The text was updated successfully, but these errors were encountered:

antifuchs · 2023-09-13T00:57:28Z

Oh man, this is ... hilarious, thank you for reporting it. I'll try and think about what it'll take to work around the issue (but if you have more than one event per nanosecond, governor will likely not be fast enough for you 😂)!

antifuchs · 2023-09-13T01:02:49Z

That said, I recognize that you're looking for a way to disable the rate limiting logic - maybe it's possible to provide an API that simply returns a success for every element (inlined, to cost as little as possible); is that a thing you'd use there? (And if so, what would you use it for?)

rkd-msw · 2023-09-13T07:29:05Z

Essentially I'm using governor as part of a broader server framework, and I want it to be able to offer rate-limiting, but I want it to be permissive by default so people don't try the framework, see it starts reporting (rate-limiting) errors after xyz,000 messages, and assume it has performance problems.

Setting the default limit to one billion is perfectly suitable for that, now that I've figured out I need to do it (we don't have more than one event per nanosecond!) so I don't think you need to worry about additional APIs for this use-case.

Interpreting any value over one billion as one billion (or conversely, fixing up any 0ns intervals to the minimum of 1ns), maybe with a warning log saying so, would be a good fix here from my perspective.

antifuchs · 2023-09-15T13:08:18Z

OK, so taking a closer look at this, I can't actually reproduce the issue in tests (I added this in tests/direct.rs):

#[cfg(feature = "std")]
#[test]
fn stresstest_large_quotas() {
    use std::{sync::Arc, thread};

    use governor::middleware::StateInformationMiddleware;

    let quota = Quota::per_second(nonzero!(1_000_000_001u32));
    let rate_limiter =
        Arc::new(RateLimiter::direct(quota).with_middleware::<StateInformationMiddleware>());

    fn rlspin(rl: Arc<DefaultDirectRateLimiter<StateInformationMiddleware>>) {
        for _ in 0..100_000_000 {
            rl.check().map_err(|e| dbg!(e)).unwrap();
        }
    }

    let rate_limiter2 = rate_limiter.clone();
    thread::spawn(move || {
        rlspin(rate_limiter2);
    });
    rlspin(rate_limiter);
}

Reading the state store code, I also can't see where this would fail: a 0ns replenishment interval would mean the "expected arrival time" never moves forward, which should mean anything being checked would pass... and it does, in this test!

antifuchs · 2023-09-15T13:17:55Z

~~Uh, tell a lie: It's failing on nightly but not on stable: https://github.com/antifuchs/governor/actions/runs/6198549159/job/16829195167?pr=207#step:7:258 - what version are you testing with?~~

My mistake, they all fail/flake in CI (so... linux, vs. my personal dev machine which is an M1 max macbook pro); sooooo, is this a linux/macOS thing? What OS are you running these tests on?

antifuchs · 2023-09-15T13:53:26Z

Hah, ok, the quota was completely innocent, it was the GCRA constructor-from-quota that set tau to 0, which then results in a burst size of 0. I think I've got a fix in #207, which passes the test I posted above.

207: Support huge (>1e9 element/sec) quotas r=antifuchs a=antifuchs This should address #203: When given large quota values, some parallel usages seem to fail. Ideally, we'd have a way to construct a quota that always passes and doesn't exhibit weird parallel usage patterns... Co-authored-by: Andreas Fuchs <[email protected]>

antifuchs mentioned this issue Sep 13, 2023

cleanup non_zero macro with const generics? #155

Open

antifuchs mentioned this issue Sep 15, 2023

Support huge (>1e9 element/sec) quotas #207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected rejections with rate limiters that allow more than a billion requests/second and are accessed from more than one thread #203

Unexpected rejections with rate limiters that allow more than a billion requests/second and are accessed from more than one thread #203

rkd-msw commented Sep 6, 2023

antifuchs commented Sep 13, 2023

antifuchs commented Sep 13, 2023

rkd-msw commented Sep 13, 2023

antifuchs commented Sep 15, 2023 •

edited

Loading

antifuchs commented Sep 15, 2023 •

edited

Loading

antifuchs commented Sep 15, 2023

Unexpected rejections with rate limiters that allow more than a billion requests/second and are accessed from more than one thread #203

Unexpected rejections with rate limiters that allow more than a billion requests/second and are accessed from more than one thread #203

Comments

rkd-msw commented Sep 6, 2023

antifuchs commented Sep 13, 2023

antifuchs commented Sep 13, 2023

rkd-msw commented Sep 13, 2023

antifuchs commented Sep 15, 2023 • edited Loading

antifuchs commented Sep 15, 2023 • edited Loading

antifuchs commented Sep 15, 2023

antifuchs commented Sep 15, 2023 •

edited

Loading

antifuchs commented Sep 15, 2023 •

edited

Loading