Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected rejections with rate limiters that allow more than a billion requests/second and are accessed from more than one thread #203

Open
rkd-msw opened this issue Sep 6, 2023 · 6 comments

Comments

@rkd-msw
Copy link

rkd-msw commented Sep 6, 2023

Here is a minimal repro:

use governor::{
    clock::QuantaClock,
    middleware::StateInformationMiddleware,
    state::{InMemoryState, NotKeyed},
    Quota, RateLimiter,
};
use std::sync::Arc;
use std::thread;

fn rlspin(rl: Arc<RateLimiter<NotKeyed, InMemoryState, QuantaClock, StateInformationMiddleware>>) {
    for _ in 0..1_000_000 {
        rl.check().map_err(|e| dbg!(e)).unwrap();
    }
}

fn main() {
    let clock = QuantaClock::default();
    let quota = Quota::per_second(1_000_000_001.try_into().unwrap());
    dbg!(quota);

    let rate_limiter: Arc<
        RateLimiter<NotKeyed, InMemoryState, QuantaClock, StateInformationMiddleware>,
    > = Arc::new(
        RateLimiter::direct_with_clock(quota, &clock)
            .with_middleware::<StateInformationMiddleware>(),
    );

    let rate_limiter2 = rate_limiter.clone();

    thread::spawn(move || {
        rlspin(rate_limiter2);
    });
    rlspin(rate_limiter);
}

(with a Cargo.toml that just depends on governor 0.6.0)

This reliably fails for me:

$ cargo run --release
   Compiling govtest v0.1.0 (/home/robday/src/mntnlake/govtest)
    Finished release [optimized] target(s) in 0.98s
     Running `target/release/govtest`
[src/main.rs:19] quota = Quota {
    max_burst: 1000000001,
    replenish_1_per: 0ns,
}
[src/main.rs:12] e = NotUntil {
    state: StateSnapshot {
        t: Nanos(0ns),
        tau: Nanos(0ns),
        time_of_measurement: Nanos(224.941µs),
        tat: Nanos(224.941µs),
    },
    start: QuantaInstant(
        Nanos(170098.543484062s),
    ),
}
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: NotUntil { state: StateSnapshot { t: Nanos(0ns), tau: Nanos(0ns), time_of_measurement: Nanos(224.941µs), tat: Nanos(224.941µs) }, start: QuantaInstant(Nanos(170098.543484062s)) }', src/main.rs:12:41
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

However, if I change line 18 to let quota = Quota::per_second(1_000_000_000.try_into().unwrap()); it reliably passes:

$ cargo run --release
   Compiling govtest v0.1.0 (/home/robday/src/mntnlake/govtest)
    Finished release [optimized] target(s) in 0.88s
     Running `target/release/govtest`
[src/main.rs:19] quota = Quota {
    max_burst: 1000000000,
    replenish_1_per: 1ns,
}
$

and if I comment out the thread::spawn it also passes:

$ cargo run --release
   Compiling govtest v0.1.0 (/home/robday/src/mntnlake/govtest)
warning: unused import: `std::thread`
 --> src/main.rs:8:5
  |
8 | use std::thread;
  |     ^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused variable: `rate_limiter2`
  --> src/main.rs:28:9
   |
28 |     let rate_limiter2 = rate_limiter.clone();
   |         ^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_rate_limiter2`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: `govtest` (bin "govtest") generated 2 warnings (run `cargo fix --bin "govtest"` to apply 2 suggestions)
    Finished release [optimized] target(s) in 0.73s
     Running `target/release/govtest`
[src/main.rs:19] quota = Quota {
    max_burst: 1000000001,
    replenish_1_per: 0ns,
}

I hit this when attempting to disable rate-limiting by configuring a rate-limit of u32::MAX, so I have an easy workaround of configuring a rate-limit of one billion, which is much higher than I need - but I thought you might appreciate the bug report anyway!

@antifuchs
Copy link
Collaborator

Oh man, this is ... hilarious, thank you for reporting it. I'll try and think about what it'll take to work around the issue (but if you have more than one event per nanosecond, governor will likely not be fast enough for you 😂)!

@antifuchs
Copy link
Collaborator

That said, I recognize that you're looking for a way to disable the rate limiting logic - maybe it's possible to provide an API that simply returns a success for every element (inlined, to cost as little as possible); is that a thing you'd use there? (And if so, what would you use it for?)

@rkd-msw
Copy link
Author

rkd-msw commented Sep 13, 2023

Essentially I'm using governor as part of a broader server framework, and I want it to be able to offer rate-limiting, but I want it to be permissive by default so people don't try the framework, see it starts reporting (rate-limiting) errors after xyz,000 messages, and assume it has performance problems.

Setting the default limit to one billion is perfectly suitable for that, now that I've figured out I need to do it (we don't have more than one event per nanosecond!) so I don't think you need to worry about additional APIs for this use-case.

Interpreting any value over one billion as one billion (or conversely, fixing up any 0ns intervals to the minimum of 1ns), maybe with a warning log saying so, would be a good fix here from my perspective.

@antifuchs
Copy link
Collaborator

antifuchs commented Sep 15, 2023

OK, so taking a closer look at this, I can't actually reproduce the issue in tests (I added this in tests/direct.rs):

#[cfg(feature = "std")]
#[test]
fn stresstest_large_quotas() {
    use std::{sync::Arc, thread};

    use governor::middleware::StateInformationMiddleware;

    let quota = Quota::per_second(nonzero!(1_000_000_001u32));
    let rate_limiter =
        Arc::new(RateLimiter::direct(quota).with_middleware::<StateInformationMiddleware>());

    fn rlspin(rl: Arc<DefaultDirectRateLimiter<StateInformationMiddleware>>) {
        for _ in 0..100_000_000 {
            rl.check().map_err(|e| dbg!(e)).unwrap();
        }
    }

    let rate_limiter2 = rate_limiter.clone();
    thread::spawn(move || {
        rlspin(rate_limiter2);
    });
    rlspin(rate_limiter);
}

Reading the state store code, I also can't see where this would fail: a 0ns replenishment interval would mean the "expected arrival time" never moves forward, which should mean anything being checked would pass... and it does, in this test!

@antifuchs
Copy link
Collaborator

antifuchs commented Sep 15, 2023

Uh, tell a lie: It's failing on nightly but not on stable: https://github.com/antifuchs/governor/actions/runs/6198549159/job/16829195167?pr=207#step:7:258 - what version are you testing with?

My mistake, they all fail/flake in CI (so... linux, vs. my personal dev machine which is an M1 max macbook pro); sooooo, is this a linux/macOS thing? What OS are you running these tests on?

@antifuchs
Copy link
Collaborator

Hah, ok, the quota was completely innocent, it was the GCRA constructor-from-quota that set tau to 0, which then results in a burst size of 0. I think I've got a fix in #207, which passes the test I posted above.

bors bot added a commit that referenced this issue Sep 23, 2023
207: Support huge (>1e9 element/sec) quotas r=antifuchs a=antifuchs

This should address #203: When given large quota values, some parallel usages seem to fail. Ideally, we'd have a way to construct a quota that always passes and doesn't exhibit weird parallel usage patterns...

Co-authored-by: Andreas Fuchs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants