Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for large, multi-part config messages #11

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

jagerman
Copy link
Member

This implements multi-part config messages, which have been stubbed in for a while but still needed an implementation.

Such messages are split after post-compression, post-padding, but before encryption (i.e. each piece gets encrypted separately), and all but the last piece will be a maximum size message (76800 bytes).

The Config object itself keeps track of partially received messages, joining them up as they arrive and processing them once it has a full set (or else expiring them if they are too old). It also tracks (without storing) recently fully processed multipart messages so that they can be discarded without pointlessly storing them if they happen to arrive again for some reason.

All of this requires some (breaking) external API changes to implement:

  • merge(...) now returns an ordered_set of config messages that failed to merge rather than a vector; we couldn't keep track of the order anymore, because of the next point.
  • merge(...) now returns all the hashes that make up a multipart set when one gets merged, even if they weren't passed in in the current merge call. For instance if you have a three-chunk multipart config and merge parts 1 and 2, but not 3, then 1 and 2 will be preserved internally. When you later provide 3, then all of hash1, hash2, and hash3 get returned to indicate that all of those were processed as a part of this merge.
  • confirm_pushed() previously took a vector and implied the order matters (although we never previously returned more than one). It now takes an unordered_set to be consistent with merge() and current_hashes(), and because the order is irrelevant.
  • The C version of confirm_pushed (config_confirm_pushed) previously only accepted a single message hash + size (and would convert this into a one-element vector for confirm_pushed()). Its API changes quite considerably as it now has to pass an array of hashes and sizes and the size of those arrays. As with the C++ version, the order of elements here is unimportant.
  • The C++ current_hashes() previously only ever returned a 0- or 1-length vector; it now returns an unordered_set, and can easily return more than one (i.e. when the current config is a multipart config message).
  • Added a new active_hashes() method that returns the current hashes plus the hashes of any partially loaded multi-part configs for which we are awaiting remaining parts. When used for "do we want to keep this alive?" this is the more appropriate method to use compared to current_hashes().

Other multi-part related changes (which don't affect the API, but are worth noting):

  • multipart messages do not support (and must not be) protobuf wrapped, because multipart messages already will not be understood by clients not using this new code, and so anything supporting multipart should also have been fixed to not apply extra protobuf wrapping. We also are returning maximum size message parts, and the protobuf overhead will push it over the storage limit in a way that isn't trivial for libsession-util to determine and compensate for.

Other changes not directly related to multipart, but included as part of this:

  • Started converting some use of ustring_view to span<const unsigned char> (alias: ucspan), adding some helper functions and changing some functions to take spans instead. For the most part this conversion can happen piecemeal as, luckily, ustring_views are implicitly convertible to ucspan's, and so most places passing a ustring_view (or ustring) will not break. (The other way, i.e. ucspan to ustring_view, does not work, and so the conversion needs to happen from innermost functions out; this initial change is focussed on some of those inner functions).

  • fixed a bug where a duplicate config (multipart or otherwise) that gets merge()d would result in that config's hash ending up in both current hashes and old hashes, but it should not have gone into old hashes.

  • started converting some string concatenation and std::to_string usage to fmt and/or "..."_format instead now that we have those available (via oxen-logging).

include/session/config.hpp Outdated Show resolved Hide resolved
include/session/config/base.hpp Outdated Show resolved Hide resolved
tests/CMakeLists.txt Show resolved Hide resolved
@Bilb
Copy link
Collaborator

Bilb commented Dec 19, 2024

LGTM. I made a fix for the clang-16 build issues at #15

Copy link
Collaborator

@mpretty-cyro mpretty-cyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like this PR doesn't play nicely with the libQuic changes due to the updated oxen-encoding version (oxenc::bt_list_consumer btlc(data); is no longer valid) - it seems we might need to wait until oxen-io/oxen-libquic#152 has been merged?

I worked around this for my testing by modifying btstream.cpp to do this instead:

auto val = oxenc::const_span<std::byte>{req.data(), req.size()};
oxenc::bt_list_consumer btlc(val);

but that resulted in Cannot create a bt_list_consumer with no data errors in the longs (and then I wasn't able to continue testing due to the config_push assertion but I assume I would have been prevented from testing properly due to this hack)

src/config/base.cpp Outdated Show resolved Hide resolved
src/config/base.cpp Show resolved Hide resolved
- Update oxen-encoding to latest dev for new span support in bt code.
- bring in oxen-logging so that we can use fmt; for now this only for
  fmt in the test suite, but we will definitely start using it in the
  future in libsession-util (the libquic PR already does).
- added an overload of the test suite "printable" functions that takes
  format+args.
- replace some `"..."_hex` string concatenations that broke when oxenc
  switched to constexpr _hex no longer return strings.
- Fixed some oxen-related test code that broke where it was touching
  oxenc::detail functions.
This implements multi-part config messages, which have been stubbed in
for a while but still needed an implementation.

Such messages are split after post-compression, post-padding, but before
encryption (i.e. each piece gets encrypted separately), and all but the
last piece will be a maximum size message (76800 bytes).

The Config object itself keeps track of partially received messages,
joining them up as they arrive and processing them once it has a full
set (or else expiring them if they are too old).  It also tracks
(without storing) recently fully processed multipart messages so that
they can be discarded without pointlessly storing them if they happen to
arrive again for some reason.

All of this requires some (breaking) external API changes to implement:

- `merge(...)` now returns an ordered_set of config messages that failed
  to merge rather than a vector; we couldn't keep track of the order
  anymore, because of the next point.
- `merge(...)` now returns *all* the hashes that make up a multipart set
  when one gets merged, even if they weren't passed in in the current
  merge call.  For instance if you have a three-chunk multipart config
  and merge parts 1 and 2, but not 3, then 1 and 2 will be preserved
  internally.  When you later provide 3, then all of hash1, hash2, and
  hash3 get returned to indicate that all of those were processed as a
  part of this merge.
- confirm_pushed() previously took a vector and implied the order
  matters (although we never previously returned more than one).  It now
  takes an unordered_set to be consistent with merge() and
  current_hashes(), and because the order is irrelevant.
- The C version of confirm_pushed (config_confirm_pushed) previously
  only accepted a single message hash + size (and would convert this
  into a one-element vector for confirm_pushed()).  Its API changes
  quite considerably as it now has to pass an array of hashes and sizes
  and the size of those arrays.  As with the C++ version, the order of
  elements here is unimportant.
- The C++ `current_hashes()` previously only ever returned a 0- or
  1-length vector; it now returns an unordered_set, and can easily
  return more than one (i.e. when the current config is a multipart
  config message).
- Added a new `active_hashes()` method that returns the current hashes
  *plus* the hashes of any partially loaded multi-part configs for which
  we are awaiting remaining parts.  When used for "do we want to keep
  this alive?" this is the more appropriate method to use compared to
  `current_hashes()`.

Other multi-part related changes (which don't affect the API, but are
worth noting):

- multipart messages *do not* support (and must not be) protobuf
  wrapped, because multipart messages *already* will not be understood
  by clients not using this new code, and so anything supporting
  multipart should also have been fixed to not apply extra protobuf
  wrapping.  We also are returning maximum size message parts, and the
  protobuf overhead will push it over the storage limit in a way that
  isn't trivial for libsession-util to determine and compensate for.

Other changes not directly related to multipart, but included as part of
this:

- Started converting some use of ustring_view to `span<const unsigned
  char>` (alias: `ucspan`), adding some helper functions and changing
  some functions to take spans instead.  For the most part this
  conversion can happen piecemeal as, luckily, ustring_views are
  implicitly convertible to `ucspan`'s, and so most places passing a
  ustring_view (or ustring) will not break.  (The other way, i.e. ucspan
  to ustring_view, does not work, and so the conversion needs to happen
  from innermost functions out; this initial change is focussed on some
  of those inner functions).

- fixed a bug where a duplicate config (multipart or otherwise) that
  gets merge()d would result in that config's hash ending up in *both*
  current hashes and old hashes, but it should not have gone into old
  hashes.

- started converting some string concatenation and std::to_string usage
  to fmt and/or "..."_format instead now that we have those available
  (via oxen-logging).
_curr_hash = all_hashes[_config->unmerged_index()];
_curr_hashes.clear();
auto& hashes = all_hashes[_config->unmerged_index()];
_curr_hashes.insert(hashes.begin(), hashes.end());
}
} else {
// the merging affect nothing (if it had seqno would have been incremented), so don't
// pointlessly replace the inner config object.
assert(new_conf->unmerged_index() == 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering whether this assert should be removed - just thinking that there are situations where a merge can occur with a config which is not going to result in a change

Eg. if you link a new device and, due to network conditions, it wasn't able to retrieve the existing configs then it would push up it's "new user" config, which would then be pulled down on your existing device, merged, and likely result in no changes right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just realised that at the moment we consistently merge in the current config because when you push up a new config message we don't update the lastHash value (so we don't unintentionally miss a config that was pushed up by another device at the same time)

Which means on the poll after pushing the current config we will always receive the current config back and try to merge it 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants