Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach http::client::make_request() to timeout #2304

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions include/seastar/http/client.hh
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
#include <seastar/http/reply.hh>
#include <seastar/core/condition-variable.hh>
#include <seastar/core/iostream.hh>
#include <seastar/core/lowres_clock.hh>
#include <seastar/util/modules.hh>

namespace bi = boost::intrusive;
Expand Down Expand Up @@ -169,6 +170,7 @@ class client {
public:
using reply_handler = noncopyable_function<future<>(const reply&, input_stream<char>&& body)>;
using retry_requests = bool_class<struct retry_requests_tag>;
using clock_type = lowres_clock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this name "clock_type". It's very unclear what this clock is used for. Future developers may start using it for the wrong things (e.g., if you want to measure a latency histogram, you may not want to use lowres_clock to do this).
I think you should name it "timeout_clock" instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I followed the rpc_clock_type "naming convention" for no real reason


private:
friend class http::internal::client_ref;
Expand All @@ -185,17 +187,17 @@ private:

using connection_ptr = seastar::shared_ptr<connection>;

future<connection_ptr> get_connection();
future<connection_ptr> make_connection();
future<connection_ptr> get_connection(clock_type::time_point timeout);
future<connection_ptr> make_connection(clock_type::time_point timeout);
future<> put_connection(connection_ptr con);
future<> shrink_connections();

template <std::invocable<connection&> Fn>
auto with_connection(Fn&& fn);
auto with_connection(Fn&& fn, clock_type::time_point timeout);

template <typename Fn>
requires std::invocable<Fn, connection&>
auto with_new_connection(Fn&& fn);
auto with_new_connection(Fn&& fn, clock_type::time_point timeout);

future<> do_make_request(connection& con, request& req, reply_handler& handle, std::optional<reply::status_type> expected);

Expand Down Expand Up @@ -275,12 +277,13 @@ public:
* \param req -- request to be sent
* \param handle -- the response handler
* \param expected -- the optional expected reply status code, default is std::nullopt
* \param send_timeout -- time point at which make_request will stop waiting for transport
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be clearer and less fuzzy what this timeout limits. I would call it a "connect timeout" (not "send_timeout", I think this name is wrong). I would explicitly say it is a timeout for making the initial connection but establishing a connection, it does not limit the time the request takes - and name this variable "connect_timeout".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to rename it, but I'd like to emphasize that connect timeout doesn't fit well either. As you noticed below, this timeout also applies to the cond. variable wait() which waits for existing connection (from pool) to become available, and it doesn't imply connecting by any means. From the code perspective, this timeout separates the time client spends obtaining a connection (new one or from poll) and the process of sending request head. So this is sort of "request head sending timeout", but I've no better idea for its name that that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a timeout waiting for a connection - not necessarily a new connection!

This distinction can be explained in text. But the name "send_timeout" suggests it's a timeout until the entire request is sent. This is not what you are doing. Moreover, if the request is very large (e.g., uploading a file), it is very far from what you're doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a timeout waiting for a connection - not necessarily a new connection!

🤣 that's 100% perfect explanation, all the more so the object that client waits for is indeed class connection. I'll rename

*
* Note that the handle callback should be prepared to be called more than once, because
* client may restart the whole request processing in case server closes the connection
* in the middle of operation
*/
future<> make_request(request req, reply_handler handle, std::optional<reply::status_type> expected = std::nullopt);
future<> make_request(request req, reply_handler handle, std::optional<reply::status_type> expected = std::nullopt, clock_type::time_point send_timeout = clock_type::time_point::max());

/**
* \brief Updates the maximum number of connections a client may have
Expand Down
11 changes: 11 additions & 0 deletions include/seastar/http/exception.hh
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,17 @@ public:
{}
};

/**
* Client-side exception to report that making request timed out
* prior to communicating to server
*/
class timeout_error : public base_exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a class with exactly the same name (but different definition) in rpc_types.hh, and the very similar name timed_out_error in include/seastar/core/timed_out_error.hh. All this is very confusing for users of the library.

I think the name of the HTTP exception type should closely follow the name of the HTTP status it sets. I would call it request_timeout_exception. Similar to the not_found_exception and bad_request_exception we already have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I looked at the last exception in this header which is unex pected_status_error and named the new class with _error suffix. Will rename.

public:
timeout_error()
: base_exception("client timed out", http::reply::status_type::request_timeout)
{}
};

SEASTAR_MODULE_EXPORT_END
}

Expand Down
46 changes: 31 additions & 15 deletions src/http/client.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ module;
#include <optional>
#include <stdexcept>
#include <utility>
#include <chrono>

#ifdef SEASTAR_MODULE
module seastar;
Expand Down Expand Up @@ -243,7 +244,7 @@ client::client(std::unique_ptr<connection_factory> f, unsigned max_connections,
{
}

future<client::connection_ptr> client::get_connection() {
future<client::connection_ptr> client::get_connection(clock_type::time_point timeout) {
if (!_pool.empty()) {
connection_ptr con = _pool.front().shared_from_this();
_pool.pop_front();
Expand All @@ -252,19 +253,34 @@ future<client::connection_ptr> client::get_connection() {
}

if (_nr_connections >= _max_connections) {
return _wait_con.wait().then([this] {
return get_connection();
return _wait_con.wait(timeout).then_wrapped([this, timeout] (auto f) {
try {
f.get();
} catch (const condition_variable_timed_out&) {
return make_exception_future<connection_ptr>(httpd::timeout_error());
} catch (...) {
return current_exception_as_future<connection_ptr>();
}
return get_connection(timeout);
});
}

return make_connection();
return make_connection(timeout);
}

future<client::connection_ptr> client::make_connection() {
future<client::connection_ptr> client::make_connection(clock_type::time_point timeout) {
_total_new_connections++;
return _new_connections->make().then([cr = internal::client_ref(this)] (connected_socket cs) mutable {
return _new_connections->make().then([this, timeout, cr = internal::client_ref(this)] (connected_socket cs) mutable {
http_log.trace("created new http connection {}", cs.local_address());
auto con = seastar::make_shared<connection>(std::move(cs), std::move(cr));
if (clock_type::now() > timeout) {
// Factory made new connection, though it's too late already. Don't throw
// this effort out, but don't use it to serve current request either.
return put_connection(std::move(con)).then_wrapped([] (auto f) {
f.ignore_ready_future();
return make_exception_future<connection_ptr>(httpd::timeout_error());
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need this super-rare special case. If you set a 1 minute connection timeout, and the connection was made in 59.999 seconds but then when you want to start sending it's already 60.001 seconds - why do we need to treat it specially? Remember that if we start sending in 59.999, then the timeout is no longer relevant - we could be sending 1 TB of data and the whole thing will succeed.

So I would simply remove this piece of code. If the make_connection() said it finished by the timeout, that's fine - we don't need to second-guess its decision here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factory doesn't know this deadline and it can generate new connection in much more than 59.999 seconds easily and there are two reasons for that.

First -- the call to make new connection can happen after client tried serving request over polled one (#1883). E.g. -- timeout is 1 minute, client waits 30 seconds for pooled connection, then starts sending data over it and in 15 seconds server closes its end. Client asks for a new connection and it takes 1 minute to establish it. Boom -- 105 seconds had passed.

Second -- we may equip factory::make() with timeout parameter too, but connect() is not always abortable nowadays (#2302). It's a BUG, but still http client has means to handle it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example you gave looks wrong to me. Remember that the timeout is an absolute time, not "1 minute". So if the client already waited for 30 seconds for a pool connection and then asks for a new connection, this new connection will have the same deadline and will only have 30 more seconds to complete the connection, not one minute.

I don't know what you mean that connect() is not abortable - you mean it doesn't obey timeout? Then what's the point of this whole patch? (I also asked this regarding the test!).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm starting to think that this specific if() that I didn't like is actually the main part of your patch. Apparently, you really care if the connection attempt takes more than the given amount of time (and you even said in reponses to my questions, but not in the cover letter, that you can't control the timeout in a real system connect()). All you care is that if the connection attempt takes more than the given timeout - the request will not be sent because it is already considered stale.

So this patch is apparently about not sending stale requests, more about the ability to time out. I now understand that you consider it fine to wait 10 minutes instead of the 1 minute timeout, as long as after that 10 minutes we don't send the request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The thing is that client doesn't establishes new connections itself, it calls connection_factory::make() and doesn't have any control over it. Of course, we can equip connection_factory::make() with timeout argument and assume that it doesn't ignore it (the whole client is still experimental, so we're relatively safe in changing the API like that), but should client fulfill its API contract and not send a request even if factory didn't fulfill its and missed the timeout?

}
return make_ready_future<connection_ptr>(std::move(con));
});
}
Expand Down Expand Up @@ -311,8 +327,8 @@ future<> client::set_maximum_connections(unsigned nr) {
}

template <std::invocable<connection&> Fn>
auto client::with_connection(Fn&& fn) {
return get_connection().then([this, fn = std::move(fn)] (connection_ptr con) mutable {
auto client::with_connection(Fn&& fn, clock_type::time_point timeout) {
return get_connection(timeout).then([this, fn = std::move(fn)] (connection_ptr con) mutable {
return fn(*con).finally([this, con = std::move(con)] () mutable {
return put_connection(std::move(con));
});
Expand All @@ -321,22 +337,22 @@ auto client::with_connection(Fn&& fn) {

template <typename Fn>
requires std::invocable<Fn, connection&>
auto client::with_new_connection(Fn&& fn) {
return make_connection().then([this, fn = std::move(fn)] (connection_ptr con) mutable {
auto client::with_new_connection(Fn&& fn, clock_type::time_point timeout) {
return make_connection(timeout).then([this, fn = std::move(fn)] (connection_ptr con) mutable {
return fn(*con).finally([this, con = std::move(con)] () mutable {
return put_connection(std::move(con));
});
});
}

future<> client::make_request(request req, reply_handler handle, std::optional<reply::status_type> expected) {
return do_with(std::move(req), std::move(handle), [this, expected] (request& req, reply_handler& handle) mutable {
future<> client::make_request(request req, reply_handler handle, std::optional<reply::status_type> expected, clock_type::time_point send_timeout) {
return do_with(std::move(req), std::move(handle), [this, expected, send_timeout] (request& req, reply_handler& handle) mutable {
auto f = with_connection([this, &req, &handle, expected] (connection& con) {
return do_make_request(con, req, handle, expected);
});
}, send_timeout);

if (_retry) {
f = f.handle_exception_type([this, &req, &handle, expected] (const std::system_error& ex) {
f = f.handle_exception_type([this, &req, &handle, expected, send_timeout] (const std::system_error& ex) {
auto code = ex.code().value();
if ((code != EPIPE) && (code != ECONNABORTED)) {
return make_exception_future<>(ex);
Expand All @@ -347,7 +363,7 @@ future<> client::make_request(request req, reply_handler handle, std::optional<r
// break the limit. That's OK, the 'con' will be closed really soon
return with_new_connection([this, &req, &handle, expected] (connection& con) {
return do_make_request(con, req, handle, expected);
});
}, send_timeout);
});
}

Expand Down
83 changes: 76 additions & 7 deletions tests/unit/httpd_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -891,6 +891,12 @@ static void read_simple_http_request(input_stream<char>& in) {
}
}

static future<> make_failing_http_request(http::experimental::client& cln) {
return cln.make_request(http::request::make("GET", "test", "/test"), [] (const http::reply& rep, input_stream<char>&& in) {
return make_exception_future<>(std::runtime_error("Shouldn't happen"));
}, http::reply::status_type::ok);
}

SEASTAR_TEST_CASE(test_client_response_eof) {
return seastar::async([] {
loopback_connection_factory lcf(1);
Expand All @@ -908,10 +914,7 @@ SEASTAR_TEST_CASE(test_client_response_eof) {

future<> client = seastar::async([&lcf] {
auto cln = http::experimental::client(std::make_unique<loopback_http_factory>(lcf));
auto req = http::request::make("GET", "test", "/test");
BOOST_REQUIRE_EXCEPTION(cln.make_request(std::move(req), [] (const http::reply& rep, input_stream<char>&& in) {
return make_exception_future<>(std::runtime_error("Shouldn't happen"));
}, http::reply::status_type::ok).get(), std::system_error, [] (auto& ex) {
BOOST_REQUIRE_EXCEPTION(make_failing_http_request(cln).get(), std::system_error, [] (auto& ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what I'm seeing here. As far as I know, BOOST_REQUIRE_EXCEPTION already validates that the specific exception type you specified (std::system_error) is the one thrown - and fails the test if the function doesn't throw at all. It looks to me like this whole "shouldn't happen" thing isn't needed at all, and your new function make_failing_http_request shouldn't be needed either.
What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The make request accepts lambda that returns a future. I do need to return some future from it.
Also note, that BOOST_REQUIRE_EXCEPTION wants system_error with ECONNABORTED code, while "shouldn't happen" exception is of runtime_error type, they are not related at all. The latter, as I wrote, is just because some future should be returned from there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know the "shouldn't happen" works because it returns the wrong type of exception so causes the test to fail. But I said that no exception would also make the test fail.
I don't know about the future thing and how to make this least ugly. But I still feel that boilerplate code which creates weird exceptions can't be the simplest way to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know the "shouldn't happen" works because it returns the wrong type of exception so causes the test to fail. But I said that no exception would also make the test fail.

The test passes. This indirectly proves that this "shouldn't happen" is indeed never called, so it can be assert(false) or return make_ready_future<>() as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll pass default-initialized noncopyable_function there. If called would just segfault , but since it's not called it would work

return ex.code().value() == ECONNABORTED;
});

Expand Down Expand Up @@ -940,9 +943,7 @@ SEASTAR_TEST_CASE(test_client_response_parse_error) {
future<> client = seastar::async([&lcf] {
auto cln = http::experimental::client(std::make_unique<loopback_http_factory>(lcf));
auto req = http::request::make("GET", "test", "/test");
BOOST_REQUIRE_EXCEPTION(cln.make_request(std::move(req), [] (const http::reply& rep, input_stream<char>&& in) {
return make_exception_future<>(std::runtime_error("Shouldn't happen"));
}, http::reply::status_type::ok).get(), std::runtime_error, [] (auto& ex) {
BOOST_REQUIRE_EXCEPTION(make_failing_http_request(cln).get(), std::runtime_error, [] (auto& ex) {
return sstring(ex.what()).contains("Invalid http server response");
});

Expand All @@ -953,6 +954,74 @@ SEASTAR_TEST_CASE(test_client_response_parse_error) {
});
}

SEASTAR_TEST_CASE(test_client_request_send_timeout_cached_conn) {
return seastar::async([] {
loopback_connection_factory lcf(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand why this test is about a "cached connection".
Now I understand: The difference between this test and the next test is that this test waits for the existing requests to finish and release the concurrency limit - i.e., it waits on a condvar - while the next test really starts a TCP connect() and waits on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

auto ss = lcf.get_server_socket();
promise<> server_started;
future<> server = ss.accept().then([&] (accept_result ar) {
server_started.set_value();
return seastar::async([sk = std::move(ar.connection)] () mutable {
input_stream<char> in = sk.input();
read_simple_http_request(in);
sleep(std::chrono::seconds(1)).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Although "1 second should be enough for everyone" we always regret this later... Can this be some sort of condition variable or something, to tell this server to just hang forever until released and not for 1 second?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Maybe in this place we can do it.

output_stream<char> out = sk.output();
out.close().get();
});
});

future<> client = seastar::async([&lcf, &server_started] {
auto cln = http::experimental::client(std::make_unique<loopback_http_factory>(lcf), 1 /* max connections */);
// this request gets handled by server and ...
auto f1 = make_failing_http_request(cln);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you want to make a failing request? What is this trying to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because server in this test doesn't really send any responses. This make_failing_request is here just to make client create a connection. Server then delays responding into it (and delays closing its end) so the connection is in pool, but is not available for the 2nd request. Thus the 2nd request starts waiting for it on the client's condition variable, times out and test happily notices it.

server_started.get_future().get();
// ... this should hang waiting for cached connection
auto f2 = cln.make_request(http::request::make("GET", "test", "/test"), [] (const auto& rep, auto&& in) {
return make_exception_future<>(std::runtime_error("Shouldn't happen"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't understand why this "shouldn't happen" thing is needed. BOOST_REQUIRE_EXCEPTION already requires an exception, and if there is no exception, it will fail. So you don't need to replace the no exception by a silly "shouldn't happen" exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just because I need to return some future from this lambda. It could as well be make_ready_future<>(). On the other hand, if something goes wrong and server sends the response and this place gets erroneously called ... I don't know, exceptional future from here is a fuse against silly mistakes in the future. Methinks

}, http::reply::status_type::ok, http::experimental::client::clock_type::now() + std::chrono::milliseconds(100));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you are assuming that if the server hangs the previous request for 1 second, then a timeout of 100ms will not be enough to make a new connection. But this is the stuff flaky tests are made of. Would be better to use a condition variable instead of these numbers, or if it's not possible, I think we need this to be more than 1 second sleep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client doesn't attempt to make new connection, the max_connections here is 1, so it's waiting for the previous one to return to pool. I think you're right and we can go with explicit server kick.


BOOST_REQUIRE_THROW(f2.get(), httpd::timeout_error);
cln.close().get();
try {
f1.get();
} catch (...) {
}
});

when_all(std::move(client), std::move(server)).discard_result().get();
});
}

SEASTAR_TEST_CASE(test_client_request_send_timeout_new_conn) {
return seastar::async([] {
loopback_connection_factory lcf(1);
auto ss = lcf.get_server_socket();
future<> server = ss.accept().discard_result();

class delayed_factory : public loopback_http_factory {
public:
explicit delayed_factory(loopback_connection_factory& f) : loopback_http_factory(f) {}
virtual future<connected_socket> make() override {
return sleep(std::chrono::seconds(1)).then([this] {
return loopback_http_factory::make();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, what you're doing here isn't causing the TCP connect to timeout, but for the connection factor to add an artificial delay. So you are NOT testing the ability of this code to timeout on a very long connect() attempt, but rather testing the extra "check now() after the factory returned" which I said in the previous test you should remove.
So I think this test is wrong and you are not testing the really important feature - the ability to break a TCP connect().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct, but this patch is not about breaking TCP connect() , because it's not always working (#2302), and client doesn't mess with it at all, it uses external factory to generate new connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and tests here use test-only loopback sockets which are not TCP, but are in-memory shared queue behind connected_socket API, they don't have connect() timeout/abort. Even if they had, it wouldn't be TCP connect()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here, @nyh , I invite you to review a test that checks the ability to break TCP connect() -- #2301

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xemul I thought the whole purpose of this patch is to allow timing out connect() (either an actual Linux connect() or something from a pool). I'm starting to understand this was not the point of this PR.

});
}
};

auto client = async([&] {
auto cln = http::experimental::client(std::make_unique<delayed_factory>(lcf));
auto f = cln.make_request(http::request::make("GET", "test", "/test"), [] (const auto& rep, auto&& in) {
return make_exception_future<>(std::runtime_error("Shouldn't happen"));
}, http::reply::status_type::ok, http::experimental::client::clock_type::now() + std::chrono::milliseconds(100));

BOOST_REQUIRE_THROW(f.get(), httpd::timeout_error);
cln.close().get();
});

when_all(std::move(client), std::move(server)).discard_result().get();
});
}

SEASTAR_TEST_CASE(test_client_retry_request) {
return seastar::async([] {
loopback_connection_factory lcf(1);
Expand Down