Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow setting buffer sizes on server_socket #2458

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions include/seastar/net/api.hh
Original file line number Diff line number Diff line change
Expand Up @@ -389,13 +389,31 @@ public:

/// @}

/// Options for creating a listening socket.
///
/// WARNING: these options currently only have an effect when using
/// the POSIX stack: all options are ignored on the native stack as they
/// are not implemented there.
struct listen_options {
bool reuse_address = false;
server_socket::load_balancing_algorithm lba = server_socket::load_balancing_algorithm::default_;
transport proto = transport::TCP;
int listen_backlog = 100;
unsigned fixed_cpu = 0u;
std::optional<file_permissions> unix_domain_socket_permissions;

/// If set, the SO_SNDBUF size will be set to the given value on the listening socket
/// via setsockopt. This buffer size is inherited by the sockets returned by
/// accept and is the preferred way to set the buffer size for these sockets since
/// setting it directly on the already-accepted socket is ineffective (see TCP(7)).
std::optional<int> so_sndbuf;

/// If set, the SO_RCVBUF size will be set to the given value on the listening socket
/// via setsockopt. This buffer size is inherited by the sockets returned by
/// accept and is the preferred way to set the buffer size for these sockets since
/// setting it directly on the already-accepted socket is ineffective (see TCP(7)).
std::optional<int> so_rcvbuf;

void set_fixed_cpu(unsigned cpu) {
lba = server_socket::load_balancing_algorithm::fixed;
fixed_cpu = cpu;
Expand Down Expand Up @@ -450,8 +468,8 @@ public:
return false;
}

/**
* Returns available network interfaces. This represents a
/**
* Returns available network interfaces. This represents a
nyh marked this conversation as resolved.
Show resolved Hide resolved
* snapshot of interfaces available at call time, hence the
* return by value.
*/
Expand Down
11 changes: 10 additions & 1 deletion src/core/reactor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ module;
#include <cassert>
#include <chrono>
#include <cmath>
#include <coroutine>
#include <coroutine>
#include <exception>
#include <filesystem>
#include <fstream>
Expand Down Expand Up @@ -1566,6 +1566,15 @@ reactor::posix_listen(socket_address sa, listen_options opts) {
if (opts.reuse_address) {
fd.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1);
}

if (opts.so_sndbuf) {
fd.setsockopt(SOL_SOCKET, SO_SNDBUF, *opts.so_sndbuf);
}

if (opts.so_rcvbuf) {
fd.setsockopt(SOL_SOCKET, SO_RCVBUF, *opts.so_rcvbuf);
}

if (_reuseport && !sa.is_af_unix())
fd.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1);

Expand Down
80 changes: 79 additions & 1 deletion tests/unit/socket_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,17 @@
#include <seastar/util/std-compat.hh>
#include <seastar/util/later.hh>
#include <seastar/testing/test_case.hh>
#include <seastar/testing/thread_test_case.hh>
#include <seastar/core/abort_source.hh>
#include <seastar/core/sleep.hh>
#include <seastar/core/thread.hh>
#include <seastar/core/when_all.hh>

#include <seastar/net/api.hh>
#include <seastar/net/posix-stack.hh>

#include <optional>
#include <tuple>

using namespace seastar;

future<> handle_connection(connected_socket s) {
Expand Down Expand Up @@ -258,3 +262,77 @@ SEASTAR_TEST_CASE(socket_connect_abort_test) {
when_all(std::move(cf), std::move(check), std::move(abort)).get();
});
}

SEASTAR_THREAD_TEST_CASE(socket_bufsize) {

// Test that setting the send and recv buffer sizes on the listening
// socket is propagated to the socket returned by accept().

auto buf_size = [](std::optional<int> snd_size, std::optional<int> rcv_size) {
listen_options lo{
.reuse_address = true,
.lba = server_socket::load_balancing_algorithm::fixed,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we create multiple connections, we need to ensure we LB them all to shard 0, or else the accept() will just hang: this definite caused some confusion for a while.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it hang?

Copy link
Contributor Author

@travisdowns travisdowns Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test connects to the server socket and does an accept() on that socket (lines 280 + 281) expecting to get the connection we just made, all on shard 0, but if load balancing is on the second time through the connection will be send to shard 1, so the accept() on shard will not receive it and so will "hang".

All the other tests just happen to work I think since they only ever accept() a single connection on any server_socket and this first connection always balances to shard 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we had a discussion about this problem on the Seastar mailing list last year - see https://groups.google.com/g/seastar-dev/c/lMEXD_M04gE/m/hF41lhtDCAAJ, and the conclusions were that:

  1. This issue should be described in the woefully incomplete tutorial.md. Everybody who tried to write trivial servers in Seastar will encounter this problem and waste hours understanding what he did wrong - I did too.
  2. We have a document doc/network-connection-load-balancing.md
    written by @gleb-cloudius which describes load_balancing_algorithm::fixed but is missing the newer set_fixed_cpu() function which is (supposedly?) easier to use.

.so_sndbuf = snd_size,
.so_rcvbuf = rcv_size
};

ipv4_addr addr("127.0.0.1", 1234);
server_socket ss = seastar::listen(addr, lo);
connected_socket client = connect(addr).get();
connected_socket server = ss.accept().get().connection;

auto sockopt = [&](int option) {
int val{};
int ret = server.get_sockopt(SOL_SOCKET, option, &val, sizeof(val));
BOOST_REQUIRE_EQUAL(ret, 0);
return val;
};

int send = sockopt(SO_SNDBUF);
int recv = sockopt(SO_RCVBUF);

ss.abort_accept();
client.shutdown_output();
server.shutdown_output();


return std::make_tuple(send, recv);
};

constexpr int small_size = 8192, big_size = 128 * 1024;

// we pass different sizes for send and recv to catch any copy/paste
// style bugs
auto [send_small, recv_small] = buf_size(small_size, small_size * 2);
auto [send_big, recv_big] = buf_size(big_size, big_size * 2);

// Setting socket buffer sizes isn't an exact science: the kernel does
// some rounding, and also (currently) doubles the requested size and
// also applies so limits. So as a basic check, assert simply that the
// explicit small buffer ends up smaller than the explicit big buffer,
// and that both results are at least as large as the requested amount.
// The latter condition could plausibly fail if the OS clamped the size
// at a small amount, but this is unlikely for the chosen buffer sizes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so strange. This bizarre doubling is even documented in socket(7).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nyh I think part/all of it is because people have certain expectations about what the socket buffer is, but Linux has a more expansive definition. E.g., Linux uses part of the buffer for things other than the data itself, like metadata. So a user who makes a 100KB socket may expect to write 100KB w/o blocking on an empty socket, but Linux only makes 50KB available for that by default (say). By doubling the size, this cancels out with the half taken away for internal purposes, so the expectations are met (though it is perhaps weird that it returns the doubled size in getsockopt calls).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are newer APIs (for CRIU) which round-trip the values.


BOOST_CHECK_LT(send_small, send_big);
BOOST_CHECK_LT(recv_small, recv_big);

BOOST_CHECK_GE(send_small, small_size);
BOOST_CHECK_GE(send_big, big_size);

BOOST_CHECK_GE(recv_small, small_size * 2);
BOOST_CHECK_GE(recv_big, big_size * 2);

// not much to check here with "default" sizes, but let's at least call it
// and check that we get a reasonable answer
auto [send_default, recv_default] = buf_size({}, {});

BOOST_CHECK_GE(send_default, 4096);
BOOST_CHECK_GE(recv_default, 4096);

// we don't really know the default socket size and it can vary by kernel
// config, but 20 MB should be enough for everyone.
BOOST_CHECK_LT(send_default, 20'000'000);
BOOST_CHECK_LT(recv_default, 20'000'000);
}