Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move output IO throttler to IO queue level #2332

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

xemul
Copy link
Contributor

@xemul xemul commented Jul 5, 2024

IO queue is responsible for two things:

  • cross-class shares-based fair balancing
  • throttling the flow of dispatching requests to obey disk model

The former is done with the help of fair_queue that implements rather simple "virtual capacity" model. The latter is done with the help of shared token bucket, but historically the throttling code was implemented as a part of fair_queue. That's not correct, fair queue has nothing to do with output flow control. This PR moves the throttling code out of fair_queue and makes IO-queue own it and use.

xemul added 12 commits July 5, 2024 17:42
Current tests on fair queue try to make the queue submit requests in
extremely controllable way -- one-by-one. However, the fair queue
nowadays is driven by rated token bucket and is very sensitive to time
and durations. It's better to teach the test accept the fact that it
cannot control fair-queue requests submissions on per-request
granularity and tunes its accounting instead.

The change affects two places.

Main loop. Before the change it called fair_queue::dispatch_requests()
as many times are the number of requests test case wants to pass, then
performed the necessary checks. Now, the method is called infinitely,
and the handling only processes the requested amount of requests. The
rest is ignored.

Drain. Before the change it called dispatch_requests() in a loop until
it returned anything. Now it's called in a loop until fair queue
explicitly reports that it's empty.

Signed-off-by: Pavel Emelyanov <[email protected]>
The class in question only controls the output flow of capacities, it's
not about fair queueing at all. There is an effort to make cross-shard
fairness, that needs fair_group however, but we're not yet there.

refs: scylladb#2294

Signed-off-by: Pavel Emelyanov <[email protected]>
The io_group carries throttle instances for each of the streams, but is
still named as _f[air]g[roup]s. Effectively, this is the continuation of
the previous patch.

Signed-off-by: Pavel Emelyanov <[email protected]>
The move is very smoth, no changes in code needed but the definition of
throttle::capacity_t. It used to be the same as
fair_queue_entry::capacity_t, since we use the same notion of capacity
for both -- fair queuing and capacity throttling.

Signed-off-by: Pavel Emelyanov <[email protected]>
The capacity grabbing method now accepts fq entry referece, but this
code is going to be moved to throttle.(cc|hh) which doesn't work with fq
entries.

Signed-off-by: Pavel Emelyanov <[email protected]>
Shard-local fair_queue class makes some shard local bookkeeping of
throttled capacities. Shared bookkeeping was done by fair_group which
was renamed and moved away, now it's time to collect shard-local
throttle facilities.

Signed-off-by: Pavel Emelyanov <[email protected]>
No function changes, just move the code.

Signed-off-by: Pavel Emelyanov <[email protected]>
IO queue uses the concept of "stream" to handle duplex devices. The
stream is now a fair_queue, but for future patching it's needed to put
more onto stream, so this patch wraps the on-io-queue fair_queues into
helper struct stream.

Signed-off-by: Pavel Emelyanov <[email protected]>
The fair queue code doesn't "know" which entries it schedules. For that
there's an "abstract" fair_queue_entry that are scheduled and to
dispatch those the caller must provide std::function<> callback to the
fair_queue::dispatch_requests() method.

Next patches will need to make fair_queue call more code on those
entries. Not to introduce another std::function<> callback, change the
dispatch indirection from callback to pure virtual .dispatch() method
of the fair_queue_entry.

Signed-off-by: Pavel Emelyanov <[email protected]>
When fair_queue tries to dispatch a request it tries to grab the
capacity from throttler on its own. That's wrong place, fair group is
just about providing cross-classes fairness of requests. Throttling
output stream of requests should be done by io_queue itself.

Signed-off-by: Pavel Emelyanov <[email protected]>
Those notifications were used to update output throttler with the result
of requests dispatching/cancellation/etc. Now when the throttler is on
the io_queue level, fair queue can be freed from this duty.

Signed-off-by: Pavel Emelyanov <[email protected]>
The method is empty now.

Signed-off-by: Pavel Emelyanov <[email protected]>
@xemul xemul requested a review from pwrobelse July 5, 2024 15:16
@@ -297,6 +298,10 @@ class queued_io_request final : private internal::io_request, public fair_queue_
queued_io_request(queued_io_request&&) = delete;
~queued_io_request() = default;

virtual throttle::grab_result can_dispatch() const noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the compiler on CI complains that it does not use override keyword.

io_queue.cc:301:35: error: 'can_dispatch' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
  301 |     virtual throttle::grab_result can_dispatch() const noexcept {

void fair_queue::dispatch_requests() {
capacity_t dispatched = 0;
boost::container::small_vector<priority_class_ptr, 2> preempt;

while (!_handles.empty() && (dispatched < _throttle.per_tick_grab_threshold())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: dispatched variable becomes unused after the removal. The compiler on CI complains about it:

fair_queue.cc:238:16: error: variable 'dispatched' set but not used [-Werror,-Wunused-but-set-variable]
  238 |     capacity_t dispatched = 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure why the compiler raises error -- below there is dispatched += req_cap;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this variable becomes write-only, which effectively means unused. This place is obsoleted with #2294 anyway :(

@@ -602,17 +602,17 @@ shared_throttle::config io_group::make_throttle_config(const io_queue::config& q
}

std::chrono::duration<double> io_group::io_latency_goal() const noexcept {
return _fgs.front().rate_limit_duration();
return _throttle.front().rate_limit_duration();
}

io_group::io_group(io_queue::config io_cfg, unsigned nr_queues)
: _config(std::move(io_cfg))
, _allocated_on(this_shard_id())
{
auto fg_cfg = make_throttle_config(_config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: maybe we could also adjust the variable name to throttle_cfg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@@ -161,7 +161,7 @@ bool fair_queue::class_compare::operator() (const priority_class_ptr& lhs, const
return lhs->_accumulated > rhs->_accumulated;
}

fair_queue::fair_queue(fair_group& group, config cfg)
fair_queue::fair_queue(shared_throttle& group, config cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: in fair_queue.hh the names of parameters of the constructor do not match the ones that are used here. In the header there is fair_queue(shared_throttle& shared, config cfg).

Do we want to keep both places consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will need to update header name too

Comment on lines +31 to +33
/// This is a fair group. It's attached by one or mode fair queues. On machines having the
/// big* amount of shards, queues use the group to borrow/lend the needed capacity for
/// requests dispatching.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: just a short question about the comment with description. In previous patches we renamed fair_group --> shared_throttle. Are any changes needed in the case of the description? It uses the old nomenclature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. I fixed this comment in one of the rebases and lost it along the way

, _per_tick_threshold(_token_bucket.limit() / nr_queues)
{
if (tokens_capacity(cfg.min_tokens) > _token_bucket.threshold()) {
throw std::runtime_error("Fair-group replenisher limit is lower than threshold");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: we renamed fair_group to shared_throttle. Do we want to update also this error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good catch

Comment on lines +22 to +26
#ifdef SEASTAR_MODULE
module;
#endif

#include <seastar/core/internal/throttle.hh>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question related to the build with modules -- as far as I understand ./src/seastar.cc includes all headers to provide the declarations and definitions when we build the code with modules.

The source files (e.g. fair_queue.cc, io_queue.cc and reactor.cc) use ifndef e.g.:

#ifdef SEASTAR_MODULE
module seastar;
#else
// include the headers related to seastar
#endif

The comment from seastar.cc:

// include all declaration and definitions to be exported in the the module
// purview

My question is as follows -- do we want to follow the same approach in throttle.cc? Also, is there an explicit rule in seastar project related to inclusion of files and SEASTAR_MODULE macro?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's any set of rules on how to manipulate SEASTAR_MODULE and related. @tchaikov works on modules support, he can shed more light on this

Copy link
Contributor

@tchaikov tchaikov Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwrobelse hi Patryk, thank you for bringing this up.

seastar.cc includes both the global module fragment and the purview of the seastar module. we put the implementation into the module implementation units which starts with module seastar, for instance, see fair_queue.cc as you pointed out.

when it comes to throttle.cc, it needs to access the seastar module, so i think we would need to start it with module seastar; instead of #include <seastar/core/internal/throttle.hh> when C++20 modules is enabled, which is checked using #ifdef SEASTAR_MODULE in general.

I'm not sure if there's any set of rules on how to manipulate SEASTAR_MODULE and related.

@xemul good point. i will note this down. could you suggest which document i should put this in? as its target audience will be library developer, instead of library user, the closest one i can find is coding-style.md. but this document focuses on coding style, it's more about taste, instead of the guideline on the general design or a perspective of the code structure. shall i create another document on C++20 modules?

@@ -103,10 +92,9 @@ future<> perf_fair_queue::test(bool loc) {

auto invokers = local_fq.invoke_on_all([loc] (local_fq_and_class& local) {
return parallel_for_each(boost::irange(0u, requests_to_dispatch), [&local, loc] (unsigned dummy) {
auto cap = local.queue(loc).tokens_capacity(double(1) / std::numeric_limits<int>::max() + double(1) / std::numeric_limits<int>::max());
auto req = std::make_unique<local_fq_entry>(cap, [&local, loc, cap] {
auto req = std::make_unique<local_fq_entry>(100, [&local, loc] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double-check -- this patch replaces the calculation with hard-coded value of 100. Was the value the same before? Does it matter for the test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that due to returning throttle::grab_result::grabbed from can_dispatch() the capacity does not matter to the test? Because we always can dispatch (because that's how the test is now) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this capacity can be any, since can_dispatch() always says "yes you can"

@@ -101,7 +90,6 @@ class test_env {
unsigned tick(unsigned n = 0) {
unsigned processed = 0;
while (true) {
_fg.replenish_capacity(_fg.replenished_ts() + std::chrono::microseconds(1));
_fq.dispatch_requests();
Copy link
Contributor

@pwrobelse pwrobelse Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that after defining:

    virtual throttle::grab_result can_dispatch() const noexcept override {
        return throttle::grab_result::grabbed;
    }

in request class, then this line will dispatch all request that were queued before? When reading the code of dispatch_requests() I had the impression, that it is the case.

If so, are we leaving the code as it was (mainly while(true) part) just in case if anybody wanted to define another request type that do not allow to be dispatched immediately? Is this while(true) needed (of course if I did not misunderstand the code) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect, that this is the root cause of the failure of the test.

Let's take test_fair_queue_longer_run_different_shares as an example.

We firstly queue a lot of ops:

    for (int i = 0; i < 20000; ++i) {
        env.do_op(a, 1);
        env.do_op(b, 0);
    }

And then we want to let them in over time:

   for (int i = 0; i < 1000; ++i) {
        sleep(1ms).get();
        env.tick(3);
    }

However, because all requests can be dispatched (new implementation of can_dispatch()) and the first call to tick(3) invokes dispatch_requests() that submits all requests at once, then the second call to tick(3) is stuck in infinite loop.

When I added a debug print in the loop that calls tick(3), then I saw:

23: Call for i = 0
23: Call for i = 1
<reactor stall errors>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails because there's no (easy) way to tell dispatch loop to stop dispatching. So I marked this PR as draft to fix it later

@@ -551,7 +555,6 @@ void
io_queue::complete_request(io_desc_read_write& desc) noexcept {
_requests_executing--;
_requests_completed++;
_streams[desc.stream()].notify_request_finished(desc.capacity());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, that the function does not need to take a parameter -- it is just responsible for accounting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

: fq(std::move(cfg))
, thr(st)
{ }

class io_desc_read_write final : public io_completion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the removal of the need to notify about completion, this class does not need to keep _stream and fq_capacity members.

capacity_t dispatched = 0;
boost::container::small_vector<priority_class_ptr, 2> preempt;

while (!_handles.empty() && (dispatched < _group.per_tick_grab_threshold())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We removed here the check related to per_tick_grab_threshold() -- when I search for that function call in this PR then we did not introduce its usage anywhere.

Is it expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This per-tick threshold is completely removed with #2294

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled the change to my local repository and fair_queue_test.cc fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, fails for me too

xemul added a commit to xemul/seastar that referenced this pull request Aug 30, 2024
There's one interesting even that is generated by fair-queue -- when the
queue starts waiting for shared capacity. In fact, the output throttler
code should exist on IO-queue level (see scylladb#2332), but currently it's in
the fair-queue, so it will need to generate this event.

This patch adds tracer reference on fair-queue for future patching.

Signed-off-by: Pavel Emelyanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants